ray.data.Dataset.sort
ray.data.Dataset.sort#
- Dataset.sort(key: Union[None, str, Callable[[ray.data.block.T], Any]] = None, descending: bool = False) ray.data.dataset.Dataset[ray.data.block.T] [source]#
Sort the dataset by the specified key column or key function.
Examples
>>> import ray >>> # Sort using the entire record as the key. >>> ds = ray.data.range(100) >>> ds.sort() Sort +- Dataset(num_blocks=..., num_rows=100, schema=<class 'int'>) >>> # Sort by a single column in descending order. >>> ds = ray.data.from_items( ... [{"value": i} for i in range(1000)]) >>> ds.sort("value", descending=True) Sort +- Dataset(num_blocks=..., num_rows=1000, schema={value: int64}) >>> # Sort by a key function. >>> ds.sort(lambda record: record["value"])
Time complexity: O(dataset size * log(dataset size / parallelism))
- Parameters
key –
For Arrow tables, key must be a single column name.
For datasets of Python objects, key can be either a lambda function that returns a comparison key to sort by, or None to sort by the original value.
descending – Whether to sort in descending order.
- Returns
A new, sorted dataset.