ray.data.Dataset.random_shuffle#

Dataset.random_shuffle(*, seed: Optional[int] = None, num_blocks: Optional[int] = None, **ray_remote_args) ray.data.dataset.Dataset[source]#

Randomly shuffle the rows of this Dataset.

Tip

This method can be slow. For better performance, try Iterating over batches with shuffling. Also, see Optimizing shuffles.

Examples

>>> import ray
>>> ds = ray.data.range(100)
>>> ds.random_shuffle().take(3)  
{'id': 41}, {'id': 21}, {'id': 92}]
>>> ds.random_shuffle(seed=42).take(3)  
{'id': 77}, {'id': 21}, {'id': 63}]

Time complexity: O(dataset size / parallelism)

Parameters
  • seed – Fix the random seed to use, otherwise one is chosen based on system randomness.

  • num_blocks – The number of output blocks after the shuffle, or None to retain the number of blocks.

Returns

The shuffled Dataset.