Dataset.iter_tf_batches(*, prefetch_batches: int = 1, batch_size: Optional[int] = 256, dtypes: Optional[Union[tf.dtypes.DType, Dict[str, tf.dtypes.DType]]] = None, drop_last: bool = False, local_shuffle_buffer_size: Optional[int] = None, local_shuffle_seed: Optional[int] = None, prefetch_blocks: int = 0) Iterator[Union[tf.Tensor, Dict[str, tf.Tensor]]][source]#

Return an iterator over batches of data represented as TensorFlow tensors.

This iterator yields batches of type Dict[str, tf.Tensor]. For more flexibility, call iter_batches() and manually convert your data to TensorFlow tensors.


If you don’t need the additional flexibility provided by this method, consider using to_tf() instead. It’s easier to use.


This operation will trigger execution of the lazy transformations performed on this dataset.


import ray

ds = ray.data.read_csv("s3://anonymous@air-example-data/iris.csv")

tf_dataset = ds.to_tf(
    feature_columns="sepal length (cm)",
for features, labels in tf_dataset:
    print(features, labels)
tf.Tensor([5.1 4.9], shape=(2,), dtype=float64) tf.Tensor([0 0], shape=(2,), dtype=int64)
tf.Tensor([6.2 5.9], shape=(2,), dtype=float64) tf.Tensor([2 2], shape=(2,), dtype=int64)

Time complexity: O(1)

  • prefetch_batches – The number of batches to fetch ahead of the current batch to fetch. If set to greater than 0, a separate threadpool is used to fetch the objects to the local node, format the batches, and apply the collate_fn. Defaults to 1.

  • batch_size – The number of rows in each batch, or None to use entire blocks as batches (blocks may contain different numbers of rows). The final batch may include fewer than batch_size rows if drop_last is False. Defaults to 256.

  • dtypes – The TensorFlow dtype(s) for the created tensor(s); if None, the dtype is inferred from the tensor data.

  • drop_last – Whether to drop the last batch if it’s incomplete.

  • local_shuffle_buffer_size – If not None, the data is randomly shuffled using a local in-memory shuffle buffer, and this value serves as the minimum number of rows that must be in the local in-memory shuffle buffer in order to yield a batch. When there are no more rows to add to the buffer, the remaining rows in the buffer are drained. batch_size must also be specified when using local shuffling.

  • local_shuffle_seed – The seed to use for the local random shuffle.


An iterator over TensorFlow Tensor batches.

See also


Call this method to manually convert your data to TensorFlow tensors.