Dataset.take_batch(batch_size: int = 20, *, batch_format: str | None = 'default') pyarrow.Table | pandas.DataFrame | Dict[str, numpy.ndarray][source]#

Return up to batch_size rows from the Dataset in a batch.

Ray Data represents batches as NumPy arrays or pandas DataFrames. You can configure the batch type by specifying batch_format.

This method is useful for inspecting inputs to map_batches().


take_batch() moves up to batch_size rows to the caller’s machine. If batch_size is large, this method can cause an ` OutOfMemory error on the caller.


This operation will trigger execution of the lazy transformations performed on this dataset.


>>> import ray
>>> ds = ray.data.range(100)
>>> ds.take_batch(5)
{'id': array([0, 1, 2, 3, 4])}

Time complexity: O(batch_size specified)

  • batch_size – The maximum number of rows to return.

  • batch_format – If "default" or "numpy", batches are Dict[str, numpy.ndarray]. If "pandas", batches are pandas.DataFrame.


A batch of up to batch_size rows from the dataset.


ValueError – if the dataset is empty.