ray.data.Dataset.to_modin#

Dataset.to_modin() → modin.pandas.dataframe.DataFrame[source]#

Convert this Dataset into a Modin DataFrame.

This works by first converting this dataset into a distributed set of Pandas DataFrames (using Dataset.to_pandas_refs()). See caveats there. Then the individual DataFrames are used to create the Modin DataFrame using modin.distributed.dataframe.pandas.partitions.from_partitions().

This is only supported for datasets convertible to Arrow records. This function induces a copy of the data. For zero-copy access to the underlying data, consider using to_arrow_refs() or iter_internal_ref_bundles().

Note

This operation will trigger execution of the lazy transformations performed on this dataset.

Time complexity: O(dataset size / parallelism)

Returns:: A Modin DataFrame created from this dataset.