ray.data.Dataset.to_dask
ray.data.Dataset.to_dask#
- Dataset.to_dask(meta: Optional[Union[pandas.DataFrame, pandas.Series, Dict[str, Any], Iterable[Any], Tuple[Any]]] = None) dask.DataFrame [source]#
Convert this dataset into a Dask DataFrame.
This is only supported for datasets convertible to Arrow records.
Note that this function will set the Dask scheduler to Dask-on-Ray globally, via the config.
Note
This operation will trigger execution of the lazy transformations performed on this dataset.
Time complexity: O(dataset size / parallelism)
- Parameters
meta – An empty pandas DataFrame or Series that matches the dtypes and column names of the stream. This metadata is necessary for many algorithms in dask dataframe to work. For ease of use, some alternative inputs are also available. Instead of a DataFrame, a dict of
{name: dtype}
or iterable of(name, dtype)
can be provided (note that the order of the names should match the order of the columns). Instead of a series, a tuple of(name, dtype)
can be used. By default, this will be inferred from the underlying Dataset schema, with this argument supplying an optional override.- Returns
A Dask DataFrame created from this dataset.