ray.data.Dataset.sort#

Dataset.sort(key: Union[None, str, Callable[[ray.data.block.T], Any]] = None, descending: bool = False) ray.data.dataset.Dataset[ray.data.block.T][source]#

Sort the dataset by the specified key column or key function.

This is a blocking operation.

Examples

>>> import ray
>>> # Sort using the entire record as the key.
>>> ds = ray.data.range(100)
>>> ds.sort()
Sort
+- Dataset(num_blocks=..., num_rows=100, schema=<class 'int'>)
>>> # Sort by a single column in descending order.
>>> ds = ray.data.from_items(
...     [{"value": i} for i in range(1000)])
>>> ds.sort("value", descending=True)
Dataset(num_blocks=..., num_rows=1000, schema={value: int64})
>>> # Sort by a key function.
>>> ds.sort(lambda record: record["value"]) 

Time complexity: O(dataset size * log(dataset size / parallelism))

Parameters
  • key

    • For Arrow tables, key must be a single column name.

    • For datasets of Python objects, key can be either a lambda function that returns a comparison key to sort by, or None to sort by the original value.

  • descending – Whether to sort in descending order.

Returns

A new, sorted dataset.