ray.data.FileShuffleConfig#

class ray.data.FileShuffleConfig(seed: int | None = None)[source]#

Configuration for file shuffling.

This configuration object controls how files are shuffled while reading file-based datasets.

Note

Even if you provided a seed, you might still observe a non-deterministic row order. This is because tasks are executed in parallel and their completion order might vary. If you need to preserve the order of rows, set DataContext.get_current().execution_options.preserve_order.

Parameters:

seed – An optional integer seed for the file shuffler. If provided, Ray Data shuffles files deterministically based on this seed.

Example

>>> import ray
>>> from ray.data import FileShuffleConfig
>>> shuffle = FileShuffleConfig(seed=42)
>>> ds = ray.data.read_images("s3://anonymous@ray-example-data/batoidea", shuffle=shuffle)

DeveloperAPI: This API may change across minor Ray releases.

Methods

Attributes

seed