ray.data.Dataset.randomize_block_order#
- Dataset.randomize_block_order(*, seed: int | None = None) Dataset [source]#
Randomly shuffle the blocks of this
Dataset
.This method is useful if you
split()
your dataset into shards and want to randomize the data in each shard without performing a fullrandom_shuffle()
.Note
This operation requires all inputs to be materialized in object store for it to execute.
Examples
>>> import ray >>> ds = ray.data.range(100) >>> ds.take(5) [{'id': 0}, {'id': 1}, {'id': 2}, {'id': 3}, {'id': 4}] >>> ds.randomize_block_order().take(5) {'id': 15}, {'id': 16}, {'id': 17}, {'id': 18}, {'id': 19}]
- Parameters:
seed – Fix the random seed to use, otherwise one is chosen based on system randomness.
- Returns:
The block-shuffled
Dataset
.