Saving DatasetsΒΆ

Datasets can be written to local or remote storage using .write_csv(), .write_json(), and .write_parquet().

# Write to csv files in /tmp/output.
ray.data.range(10000).write_csv("/tmp/output")
# -> /tmp/output/data0.csv, /tmp/output/data1.csv, ...

# Use repartition to control the number of output files:
ray.data.range(10000).repartition(1).write_csv("/tmp/output2")
# -> /tmp/output2/data0.csv

You can also convert a Dataset to Ray-compatible distributed DataFrames:

# Convert a Ray Dataset into a Dask-on-Ray DataFrame.
dask_df = ds.to_dask()