ray.data.Dataset.to_pandas_refs#

Dataset.to_pandas_refs() List[ObjectRef[pandas.DataFrame]][source]#

Converts this Dataset into a distributed set of Pandas dataframes.

One DataFrame is created for each block in this Dataset.

This function induces a copy of the data. For zero-copy access to the underlying data, consider using Dataset.to_arrow() or Dataset.get_internal_block_refs().

Examples

>>> import ray
>>> ds = ray.data.range(10, parallelism=2)
>>> refs = ds.to_pandas_refs()
>>> len(refs)
2

Note

This operation will trigger execution of the lazy transformations performed on this dataset.

Time complexity: O(dataset size / parallelism)

Returns:

A list of remote pandas DataFrames created from this dataset.

DeveloperAPI: This API may change across minor Ray releases.