ray.data.Dataset.take_batch#
- Dataset.take_batch(batch_size: int = 20, *, batch_format: str | None = 'default') pyarrow.Table | pandas.DataFrame | Dict[str, numpy.ndarray] [source]#
Return up to
batch_size
rows from theDataset
in a batch.Ray Data represents batches as NumPy arrays or pandas DataFrames. You can configure the batch type by specifying
batch_format
.This method is useful for inspecting inputs to
map_batches()
.Warning
take_batch()
moves up tobatch_size
rows to the caller’s machine. Ifbatch_size
is large, this method can cause an `OutOfMemory
error on the caller.Note
This operation will trigger execution of the lazy transformations performed on this dataset.
Examples
>>> import ray >>> ds = ray.data.range(100) >>> ds.take_batch(5) {'id': array([0, 1, 2, 3, 4])}
Time complexity: O(batch_size specified)
- Parameters:
batch_size – The maximum number of rows to return.
batch_format – If
"default"
or"numpy"
, batches areDict[str, numpy.ndarray]
. If"pandas"
, batches arepandas.DataFrame
.
- Returns:
A batch of up to
batch_size
rows from the dataset.- Raises:
ValueError – if the dataset is empty.