ray.data.FileShuffleConfig#

class ray.data.FileShuffleConfig(seed: int | None = None, reseed_after_execution: bool = True)[source]#

Configuration for file shuffling.

This configuration object controls how files are shuffled while reading file-based datasets. The random seed behavior is determined by the combination of seed and reseed_after_execution:

  • If seed is None, the random seed is always None (non-deterministic shuffling).

  • If seed is not None and reseed_after_execution is False, the random seed is constantly seed across executions.

  • If seed is not None and reseed_after_execution is True, the random seed is different for each execution.

Note

Even if you provided a seed, you might still observe a non-deterministic row order. This is because tasks are executed in parallel and their completion order might vary. If you need to preserve the order of rows, set DataContext.get_current().execution_options.preserve_order.

Parameters:
  • seed – An optional integer seed for the file shuffler. If None, shuffling is non-deterministic. If provided, shuffling is deterministic based on this seed and the reseed_after_execution setting.

  • reseed_after_execution – If True, the random seed considers both seed and execution_idx, resulting in different shuffling orders across executions. If False, the random seed is constantly seed, resulting in the same shuffling order across executions. Only takes effect when seed is not None. Defaults to True.

Example

>>> import ray
>>> from ray.data import FileShuffleConfig
>>> # Fixed seed - same shuffle across executions
>>> shuffle = FileShuffleConfig(seed=42, reseed_after_execution=False)
>>> ds = ray.data.read_images("s3://anonymous@ray-example-data/batoidea", shuffle=shuffle)
>>>
>>> # Seed with reseed_after_execution - different shuffle per execution
>>> shuffle = FileShuffleConfig(seed=42, reseed_after_execution=True)
>>> ds = ray.data.read_images("s3://anonymous@ray-example-data/batoidea", shuffle=shuffle)

DeveloperAPI: This API may change across minor Ray releases.

Methods

Attributes

reseed_after_execution

seed