- Dataset.iter_tf_batches(*, prefetch_batches: int = 1, batch_size: int | None = 256, dtypes: tf.dtypes.DType | Dict[str, tf.dtypes.DType] | None = None, drop_last: bool = False, local_shuffle_buffer_size: int | None = None, local_shuffle_seed: int | None = None, prefetch_blocks: int = 0) Iterable[tf.Tensor | Dict[str, tf.Tensor]] #
Return an iterable over batches of data represented as TensorFlow tensors.
This iterable yields batches of type
Dict[str, tf.Tensor]. For more flexibility, call
iter_batches()and manually convert your data to TensorFlow tensors.
If you don’t need the additional flexibility provided by this method, consider using
to_tf()instead. It’s easier to use.
This operation will trigger execution of the lazy transformations performed on this dataset.
import ray ds = ray.data.read_csv("s3://anonymous@air-example-data/iris.csv") tf_dataset = ds.to_tf( feature_columns="sepal length (cm)", label_columns="target", batch_size=2 ) for features, labels in tf_dataset: print(features, labels)
tf.Tensor([5.1 4.9], shape=(2,), dtype=float64) tf.Tensor([0 0], shape=(2,), dtype=int64) ... tf.Tensor([6.2 5.9], shape=(2,), dtype=float64) tf.Tensor([2 2], shape=(2,), dtype=int64)
Time complexity: O(1)
prefetch_batches – The number of batches to fetch ahead of the current batch to fetch. If set to greater than 0, a separate threadpool is used to fetch the objects to the local node, format the batches, and apply the
collate_fn. Defaults to 1.
batch_size – The number of rows in each batch, or
Noneto use entire blocks as batches (blocks may contain different numbers of rows). The final batch may include fewer than
False. Defaults to 256.
dtypes – The TensorFlow dtype(s) for the created tensor(s); if
None, the dtype is inferred from the tensor data.
drop_last – Whether to drop the last batch if it’s incomplete.
local_shuffle_buffer_size – If not
None, the data is randomly shuffled using a local in-memory shuffle buffer, and this value serves as the minimum number of rows that must be in the local in-memory shuffle buffer in order to yield a batch. When there are no more rows to add to the buffer, the remaining rows in the buffer are drained.
batch_sizemust also be specified when using local shuffling.
local_shuffle_seed – The seed to use for the local random shuffle.
An iterable over TensorFlow Tensor batches.
Call this method to manually convert your data to TensorFlow tensors.