DatasetIterator API#

class ray.data.DatasetIterator[source]#

An iterator for reading items from a Dataset or DatasetPipeline.

For Datasets, each iteration call represents a complete read of all items in the Dataset. For DatasetPipelines, each iteration call represents one pass (epoch) over the base Dataset. Note that for DatasetPipelines, each pass iterates over the original Dataset, instead of a window (if .window() was used).

If using Ray AIR, each trainer actor should get its own iterator by calling session.get_dataset_shard("train").

Examples

>>> import ray
>>> ds = ray.data.range(5)
>>> ds
Dataset(num_blocks=5, num_rows=5, schema=<class 'int'>)
>>> ds.iterator()
DatasetIterator(Dataset(num_blocks=5, num_rows=5, schema=<class 'int'>))
>>> ds = ds.repeat(); ds
DatasetPipeline(num_windows=inf, num_stages=2)
>>> ds.iterator()
DatasetIterator(DatasetPipeline(num_windows=inf, num_stages=2))

Tip

For debugging purposes, use make_local_dataset_iterator() to create a local DatasetIterator from a Dataset, a Preprocessor, and a DatasetConfig.

PublicAPI (beta): This API is in beta and may change before becoming stable.

DatasetIterator.iter_batches(*[, ...])

Return a local batched iterator over the dataset.

DatasetIterator.iter_torch_batches(*[, ...])

Return a local batched iterator of Torch Tensors over the dataset.

DatasetIterator.to_tf(feature_columns, ...)

Return a TF Dataset over this dataset.

DatasetIterator.stats()

Returns a string containing execution timing information.