ray.data.Dataset.take_batch#

Dataset.take_batch(batch_size: int = 20, *, batch_format: str | None = 'default') → pyarrow.Table | pandas.DataFrame | Dict[str, numpy.ndarray][source]#

Return up to batch_size rows from the Dataset in a batch.

Ray Data represents batches as NumPy arrays or pandas DataFrames. You can configure the batch type by specifying batch_format.

This method is useful for inspecting inputs to map_batches().

Warning

take_batch() moves up to batch_size rows to the caller’s machine. If batch_size is large, this method can cause an ` OutOfMemory error on the caller.

Note

This operation will trigger execution of the lazy transformations performed on this dataset.

Examples

>>> import ray
>>> ds = ray.data.range(100)
>>> ds.take_batch(5)
{'id': array([0, 1, 2, 3, 4])}

Time complexity: O(batch_size specified)

Parameters:

batch_size – The maximum number of rows to return.
batch_format – If "default" or "numpy", batches are Dict[str, numpy.ndarray]. If "pandas", batches are pandas.DataFrame.

Returns:

A batch of up to batch_size rows from the dataset.

Raises:

ValueError – if the dataset is empty.