ray.data.Dataset.to_pandas#

Dataset.to_pandas(limit: int = None) pandas.DataFrame[source]#

Convert this Dataset to a single pandas DataFrame.

This method errors if the number of rows exceeds the provided limit. To truncate the dataset beforehand, call limit().

Examples

>>> import ray
>>> ds = ray.data.from_items([{"a": i} for i in range(3)])
>>> ds.to_pandas()
   a
0  0
1  1
2  2

Note

This operation will trigger execution of the lazy transformations performed on this dataset.

Time complexity: O(dataset size)

Parameters:

limit – The maximum number of rows to return. An error is raised if the dataset has more rows than this limit. Defaults to None, which means no limit.

Returns:

A pandas DataFrame created from this dataset, containing a limited number of rows.

Raises:

ValueError – if the number of rows in the Dataset exceeds limit.