DatasetPipeline.iterator() ray.data.dataset_iterator.DatasetIterator[source]#

Return a DatasetIterator that can be used to repeatedly iterate over the dataset.

Note that each pass iterates over the entire original Dataset, even if the dataset was windowed with .window().


>>> import ray
>>> ds = ray.data.range(5).window(bytes_per_window=1).repeat()
>>> ds
DatasetPipeline(num_windows=inf, num_stages=2)
>>> for batch in ds.iterator().iter_batches(batch_size=2):
...     print(batch) 

It is recommended to use DatasetIterator methods over directly calling methods such as iter_batches().