ray.data.Dataset.to_modin
ray.data.Dataset.to_modin#
- Dataset.to_modin() modin.DataFrame [source]#
Convert this dataset into a Modin dataframe.
This works by first converting this dataset into a distributed set of Pandas dataframes (using
.to_pandas_refs()
). Please see caveats there. Then the individual dataframes are used to create the modin DataFrame usingmodin.distributed.dataframe.pandas.partitions.from_partitions()
.This is only supported for datasets convertible to Arrow records. This function induces a copy of the data. For zero-copy access to the underlying data, consider using
.to_arrow()
or.get_internal_block_refs()
.Note
This operation will trigger execution of the lazy transformations performed on this dataset.
Time complexity: O(dataset size / parallelism)
- Returns
A Modin dataframe created from this dataset.