ray.data.Dataset.random_shuffle#
- Dataset.random_shuffle(*, seed: int | None = None, num_blocks: int | None = None, **ray_remote_args) Dataset[source]#
- Randomly shuffle the rows of this - Dataset.- Tip - This method can be slow. For better performance, try Iterating over batches with shuffling. Also, see Optimizing shuffles. - Note - This operation requires all inputs to be materialized in object store for it to execute. - Examples - >>> import ray >>> ds = ray.data.range(100) >>> ds.random_shuffle().take(3) {'id': 41}, {'id': 21}, {'id': 92}] >>> ds.random_shuffle(seed=42).take(3) {'id': 77}, {'id': 21}, {'id': 63}] - Time complexity: O(dataset size / parallelism) - Parameters:
- seed – Fix the random seed to use, otherwise one is chosen based on system randomness. 
- Returns:
- The shuffled - Dataset.