DatasetIterator API
DatasetIterator API#
- class ray.data.DatasetIterator[source]#
An iterator for reading items from a
Dataset
orDatasetPipeline
.For Datasets, each iteration call represents a complete read of all items in the Dataset. For DatasetPipelines, each iteration call represents one pass (epoch) over the base Dataset. Note that for DatasetPipelines, each pass iterates over the original Dataset, instead of a window (if
.window()
was used).If using Ray AIR, each trainer actor should get its own iterator by calling
session.get_dataset_shard("train")
.Examples
>>> import ray >>> ds = ray.data.range(5) >>> ds Dataset(num_blocks=5, num_rows=5, schema=<class 'int'>) >>> ds.iterator() DatasetIterator(Dataset(num_blocks=5, num_rows=5, schema=<class 'int'>)) >>> ds = ds.repeat(); ds DatasetPipeline(num_windows=inf, num_stages=2) >>> ds.iterator() DatasetIterator(DatasetPipeline(num_windows=inf, num_stages=2))
Tip
For debugging purposes, use
make_local_dataset_iterator()
to create a localDatasetIterator
from aDataset
, aPreprocessor
, and aDatasetConfig
.PublicAPI (beta): This API is in beta and may change before becoming stable.
|
Return a local batched iterator over the dataset. |
|
Return a local batched iterator of Torch Tensors over the dataset. |
|
Return a TF Dataset over this dataset. |
Returns a string containing execution timing information. |