ray.data.Dataset.iter_tf_batches#

Dataset.iter_tf_batches(*, prefetch_blocks: int = 0, batch_size: Optional[int] = 256, dtypes: Optional[Union[tf.dtypes.DType, Dict[str, tf.dtypes.DType]]] = None, drop_last: bool = False, local_shuffle_buffer_size: Optional[int] = None, local_shuffle_seed: Optional[int] = None) Iterator[Union[tf.Tensor, Dict[str, tf.Tensor]]][source]#

Return a local batched iterator of TensorFlow Tensors over the dataset.

This iterator will yield single-tensor batches of the underlying dataset consists of a single column; otherwise, it will yield a dictionary of column-tensors.

Tip

If you don’t need the additional flexibility provided by this method, consider using to_tf() instead. It’s easier to use.

Note

This operation will trigger execution of the lazy transformations performed on this dataset, and will block until execution completes.

Examples

>>> import ray
>>> for batch in ray.data.range( 
...     12,
... ).iter_tf_batches(batch_size=4):
...     print(batch.shape) 
(4, 1)
(4, 1)
(4, 1)

Time complexity: O(1)

Parameters
  • prefetch_blocks – The number of blocks to prefetch ahead of the current block during the scan.

  • batch_size – The number of rows in each batch, or None to use entire blocks as batches (blocks may contain different number of rows). The final batch may include fewer than batch_size rows if drop_last is False. Defaults to 256.

  • dtypes – The TensorFlow dtype(s) for the created tensor(s); if None, the dtype will be inferred from the tensor data.

  • drop_last – Whether to drop the last batch if it’s incomplete.

  • local_shuffle_buffer_size – If non-None, the data will be randomly shuffled using a local in-memory shuffle buffer, and this value will serve as the minimum number of rows that must be in the local in-memory shuffle buffer in order to yield a batch. When there are no more rows to add to the buffer, the remaining rows in the buffer will be drained. This buffer size must be greater than or equal to batch_size, and therefore batch_size must also be specified when using local shuffling.

  • local_shuffle_seed – The seed to use for the local random shuffle.

Returns

An iterator over TensorFlow Tensor batches.