ray.data.Dataset.to_pandas_refs#
- Dataset.to_pandas_refs() List[ObjectRef[pandas.DataFrame]] [source]#
Converts this
Dataset
into a distributed set of Pandas dataframes.One DataFrame is created for each block in this Dataset.
This function induces a copy of the data. For zero-copy access to the underlying data, consider using
Dataset.to_arrow_refs()
orDataset.iter_internal_ref_bundles()
.Examples
>>> import ray >>> ds = ray.data.range(10, override_num_blocks=2) >>> refs = ds.to_pandas_refs() >>> len(refs) 2
Note
This operation will trigger execution of the lazy transformations performed on this dataset.
Time complexity: O(dataset size / parallelism)
- Returns:
A list of remote pandas DataFrames created from this dataset.
DeveloperAPI: This API may change across minor Ray releases.