ray.data.Dataset.random_shuffle#

Dataset.random_shuffle(*, seed: Optional[int] = None, num_blocks: Optional[int] = None, **ray_remote_args) ray.data.dataset.Dataset[ray.data.block.T][source]#

Randomly shuffle the elements of this dataset.

This is a blocking operation similar to repartition().

Examples

>>> import ray
>>> ds = ray.data.range(100)
>>> # Shuffle this dataset randomly.
>>> ds.random_shuffle()
RandomShuffle
+- Dataset(num_blocks=..., num_rows=100, schema=<class 'int'>)
>>> # Shuffle this dataset with a fixed random seed.
>>> ds.random_shuffle(seed=12345)
RandomShuffle
+- Dataset(num_blocks=..., num_rows=100, schema=<class 'int'>)

Time complexity: O(dataset size / parallelism)

Parameters
  • seed – Fix the random seed to use, otherwise one will be chosen based on system randomness.

  • num_blocks – The number of output blocks after the shuffle, or None to retain the number of blocks.

Returns

The shuffled dataset.