DataIterator API#


An iterator for reading records from a Dataset or DatasetPipeline.

For Datasets, each iteration call represents a complete read of all items in the Dataset. For DatasetPipelines, each iteration call represents one pass (epoch) over the base Dataset. Note that for DatasetPipelines, each pass iterates over the original Dataset, instead of a window (if .window() was used).

If using Ray Train, each trainer actor should get its own iterator by calling ray.train.get_dataset_shard("train").


>>> import ray
>>> ds =
>>> ds
Dataset(num_blocks=..., num_rows=5, schema={id: int64})
>>> ds.iterator()
DataIterator(Dataset(num_blocks=..., num_rows=5, schema={id: int64}))

PublicAPI (beta): This API is in beta and may change before becoming stable.

DataIterator.iter_batches(*[, ...])

Return a batched iterable over the dataset.

DataIterator.iter_torch_batches(*[, ...])

Return a batched iterable of Torch Tensors over the dataset.

DataIterator.to_tf(feature_columns, ...[, ...])

Return a TF Dataset over this dataset.


Returns a string containing execution timing information.