ray.data.Dataset.random_shuffle
ray.data.Dataset.random_shuffle#
- Dataset.random_shuffle(*, seed: Optional[int] = None, num_blocks: Optional[int] = None, **ray_remote_args) ray.data.dataset.Dataset [source]#
Randomly shuffle the rows of this
Dataset
.Tip
This method can be slow. For better performance, try Iterating over batches with shuffling. Also, see Optimizing shuffles.
Examples
>>> import ray >>> ds = ray.data.range(100) >>> ds.random_shuffle().take(3) {'id': 41}, {'id': 21}, {'id': 92}] >>> ds.random_shuffle(seed=42).take(3) {'id': 77}, {'id': 21}, {'id': 63}]
Time complexity: O(dataset size / parallelism)
- Parameters
seed – Fix the random seed to use, otherwise one is chosen based on system randomness.
num_blocks – The number of output blocks after the shuffle, or
None
to retain the number of blocks.
- Returns
The shuffled
Dataset
.