ray.data.Dataset.sort#

Dataset.sort(key: Optional[str] = None, descending: bool = False) ray.data.dataset.Dataset[source]#

Sort the dataset by the specified key column or key function.

Examples

>>> import ray
>>> # Sort by a single column in descending order.
>>> ds = ray.data.from_items(
...     [{"value": i} for i in range(1000)])
>>> ds.sort("value", descending=True)
Sort
+- Dataset(num_blocks=200, num_rows=1000, schema={value: int64})

Time complexity: O(dataset size * log(dataset size / parallelism))

Parameters
  • key – The column to sort by. To sort by multiple columns, use a map function to generate the sort column beforehand.

  • descending – Whether to sort in descending order.

Returns

A new, sorted dataset.