ray.data.Dataset.take_batch#
- Dataset.take_batch(batch_size: int = 20, *, batch_format: str | None = 'default') pyarrow.Table | pandas.DataFrame | Dict[str, numpy.ndarray][source]#
Return up to
batch_sizerows from theDatasetin a batch.Ray Data represents batches as NumPy arrays or pandas DataFrames. You can configure the batch type by specifying
batch_format.This method is useful for inspecting inputs to
map_batches().Warning
take_batch()moves up tobatch_sizerows to the caller’s machine. Ifbatch_sizeis large, this method can cause an `OutOfMemoryerror on the caller.Note
This operation will trigger execution of the lazy transformations performed on this dataset.
Examples
>>> import ray >>> ds = ray.data.range(100) >>> ds.take_batch(5) {'id': array([0, 1, 2, 3, 4])}
Time complexity: O(batch_size specified)
- Parameters:
batch_size – The maximum number of rows to return.
batch_format – If
"default"or"numpy", batches areDict[str, numpy.ndarray]. If"pandas", batches arepandas.DataFrame.
- Returns:
A batch of up to
batch_sizerows from the dataset.- Raises:
ValueError – if the dataset is empty.