ray.data.Dataset.iter_tf_batches#

Dataset.iter_tf_batches(*, prefetch_batches: int = 1, batch_size: int | None = 256, dtypes: tf.dtypes.DType | Dict[str, tf.dtypes.DType] | None = None, drop_last: bool = False, local_shuffle_buffer_size: int | None = None, local_shuffle_seed: int | None = None, prefetch_blocks: int = 0) Iterable[tf.Tensor | Dict[str, tf.Tensor]][source]#

Return an iterable over batches of data represented as TensorFlow tensors.

This iterable yields batches of type Dict[str, tf.Tensor]. For more flexibility, call iter_batches() and manually convert your data to TensorFlow tensors.

Tip

If you don’t need the additional flexibility provided by this method, consider using to_tf() instead. It’s easier to use.

Note

This operation will trigger execution of the lazy transformations performed on this dataset.

Examples

import ray

ds = ray.data.read_csv("s3://anonymous@air-example-data/iris.csv")

tf_dataset = ds.to_tf(
    feature_columns="sepal length (cm)",
    label_columns="target",
    batch_size=2
)
for features, labels in tf_dataset:
    print(features, labels)
tf.Tensor([5.1 4.9], shape=(2,), dtype=float64) tf.Tensor([0 0], shape=(2,), dtype=int64)
...
tf.Tensor([6.2 5.9], shape=(2,), dtype=float64) tf.Tensor([2 2], shape=(2,), dtype=int64)

Time complexity: O(1)

Parameters:
  • prefetch_batches – The number of batches to fetch ahead of the current batch to fetch. If set to greater than 0, a separate threadpool is used to fetch the objects to the local node, format the batches, and apply the collate_fn. Defaults to 1.

  • batch_size – The number of rows in each batch, or None to use entire blocks as batches (blocks may contain different numbers of rows). The final batch may include fewer than batch_size rows if drop_last is False. Defaults to 256.

  • dtypes – The TensorFlow dtype(s) for the created tensor(s); if None, the dtype is inferred from the tensor data.

  • drop_last – Whether to drop the last batch if it’s incomplete.

  • local_shuffle_buffer_size – If not None, the data is randomly shuffled using a local in-memory shuffle buffer, and this value serves as the minimum number of rows that must be in the local in-memory shuffle buffer in order to yield a batch. When there are no more rows to add to the buffer, the remaining rows in the buffer are drained. batch_size must also be specified when using local shuffling.

  • local_shuffle_seed – The seed to use for the local random shuffle.

Returns:

An iterable over TensorFlow Tensor batches.

See also

Dataset.iter_batches()

Call this method to manually convert your data to TensorFlow tensors.