ray.data.Dataset.iter_tf_batches#
- Dataset.iter_tf_batches(*, prefetch_batches: int = 1, batch_size: int | None = 256, dtypes: tf.dtypes.DType | Dict[str, tf.dtypes.DType] | None = None, drop_last: bool = False, local_shuffle_buffer_size: int | None = None, local_shuffle_seed: int | None = None) Iterable[tf.Tensor | Dict[str, tf.Tensor]] [source]#
Return an iterable over batches of data represented as TensorFlow tensors.
This iterable yields batches of type
Dict[str, tf.Tensor]
. For more flexibility, calliter_batches()
and manually convert your data to TensorFlow tensors.Tip
If you don’t need the additional flexibility provided by this method, consider using
to_tf()
instead. It’s easier to use.Note
This operation will trigger execution of the lazy transformations performed on this dataset.
Examples
import ray ds = ray.data.read_csv("s3://anonymous@air-example-data/iris.csv") tf_dataset = ds.to_tf( feature_columns="sepal length (cm)", label_columns="target", batch_size=2 ) for features, labels in tf_dataset: print(features, labels)
tf.Tensor([5.1 4.9], shape=(2,), dtype=float64) tf.Tensor([0 0], shape=(2,), dtype=int64) ... tf.Tensor([6.2 5.9], shape=(2,), dtype=float64) tf.Tensor([2 2], shape=(2,), dtype=int64)
Time complexity: O(1)
- Parameters:
prefetch_batches – The number of batches to fetch ahead of the current batch to fetch. If set to greater than 0, a separate threadpool is used to fetch the objects to the local node, format the batches, and apply the
collate_fn
. Defaults to 1.batch_size – The number of rows in each batch, or
None
to use entire blocks as batches (blocks may contain different numbers of rows). The final batch may include fewer thanbatch_size
rows ifdrop_last
isFalse
. Defaults to 256.dtypes – The TensorFlow dtype(s) for the created tensor(s); if
None
, the dtype is inferred from the tensor data.drop_last – Whether to drop the last batch if it’s incomplete.
local_shuffle_buffer_size – If not
None
, the data is randomly shuffled using a local in-memory shuffle buffer, and this value serves as the minimum number of rows that must be in the local in-memory shuffle buffer in order to yield a batch. When there are no more rows to add to the buffer, the remaining rows in the buffer are drained.batch_size
must also be specified when using local shuffling.local_shuffle_seed – The seed to use for the local random shuffle.
- Returns:
An iterable over TensorFlow Tensor batches.
See also
Dataset.iter_batches()
Call this method to manually convert your data to TensorFlow tensors.