ray.data.expressions.monotonically_increasing_id#
- ray.data.expressions.monotonically_increasing_id() MonotonicallyIncreasingIdExpr[source]#
Create an expression that generates monotonically increasing IDs.
The generated IDs are guaranteed to be monotonically increasing and unique, but not consecutive. The current implementation puts the task ID in the upper 31 bits, and the record number within each task in the lower 33 bits. Records within the block(s) assigned to a task receive consecutive IDs. Note that IDs are not globally ordered across tasks.
The assumption is that the dataset schedules less than 1 billion tasks, and each task processes less than 8 billion records.
The function is non-deterministic because its result depends on task IDs.
- Returns:
A MonotonicallyIncreasingIdExpr that generates unique IDs.
Example
>>> from ray.data.expressions import monotonically_increasing_id >>> import ray >>> ds = ray.data.range(4, override_num_blocks=2) >>> ds = ds.with_column("uid", monotonically_increasing_id()) >>> ds.take_all() [{'id': 0, 'uid': 0}, {'id': 1, 'uid': 1}, {'id': 2, 'uid': 8589934592}, {'id': 3, 'uid': 8589934593}]
PublicAPI (alpha): This API is in alpha and may change before becoming stable.