ray.data.Dataset.random_shuffle
ray.data.Dataset.random_shuffle#
- Dataset.random_shuffle(*, seed: Optional[int] = None, num_blocks: Optional[int] = None, **ray_remote_args) ray.data.dataset.Dataset [source]#
Randomly shuffle the elements of this dataset.
Examples
>>> import ray >>> ds = ray.data.range(100) >>> # Shuffle this dataset randomly. >>> ds.random_shuffle() RandomShuffle +- Dataset(num_blocks=..., num_rows=100, schema={id: int64}) >>> # Shuffle this dataset with a fixed random seed. >>> ds.random_shuffle(seed=12345) RandomShuffle +- Dataset(num_blocks=..., num_rows=100, schema={id: int64})
Time complexity: O(dataset size / parallelism)
- Parameters
seed – Fix the random seed to use, otherwise one will be chosen based on system randomness.
num_blocks – The number of output blocks after the shuffle, or None to retain the number of blocks.
- Returns
The shuffled dataset.