ray.data.Dataset.sort#

Dataset.sort(key: str | List[str] | None = None, descending: bool | List[bool] = False) Dataset[source]#

Sort the dataset by the specified key column or key function.

Note

The descending parameter must be a boolean, or a list of booleans. If it is a list, all items in the list must share the same direction. Multi-directional sort is not supported yet.

Note

This operation requires all inputs to be materialized in object store for it to execute.

Examples

>>> import ray
>>> ds = ray.data.range(100)
>>> ds.sort("id", descending=True).take(3)
[{'id': 99}, {'id': 98}, {'id': 97}]

Time complexity: O(dataset size * log(dataset size / parallelism))

Parameters:
  • key – The column or a list of columns to sort by.

  • descending – Whether to sort in descending order. Must be a boolean or a list of booleans matching the number of the columns.

Returns:

A new, sorted Dataset.