ray.data.Dataset.to_dask#

Dataset.to_dask(meta: Optional[Union[pandas.DataFrame, pandas.Series, Dict[str, Any], Iterable[Any], Tuple[Any]]] = None) dask.DataFrame[source]#

Convert this dataset into a Dask DataFrame.

This is only supported for datasets convertible to Arrow records.

Note that this function will set the Dask scheduler to Dask-on-Ray globally, via the config.

Time complexity: O(dataset size / parallelism)

Parameters

meta – An empty pandas DataFrame or Series that matches the dtypes and column names of the Dataset. This metadata is necessary for many algorithms in dask dataframe to work. For ease of use, some alternative inputs are also available. Instead of a DataFrame, a dict of {name: dtype} or iterable of (name, dtype) can be provided (note that the order of the names should match the order of the columns). Instead of a series, a tuple of (name, dtype) can be used. By default, this will be inferred from the underlying Dataset schema, with this argument supplying an optional override.

Returns

A Dask DataFrame created from this dataset.