Saving DatasetsΒΆ
Datasets can be written to local or remote storage using .write_csv()
, .write_json()
, and .write_parquet()
.
# Write to csv files in /tmp/output.
ray.data.range(10000).write_csv("/tmp/output")
# -> /tmp/output/data0.csv, /tmp/output/data1.csv, ...
# Use repartition to control the number of output files:
ray.data.range(10000).repartition(1).write_csv("/tmp/output2")
# -> /tmp/output2/data0.csv
You can also convert a Dataset
to Ray-compatible distributed DataFrames:
# Convert a Ray Dataset into a Dask-on-Ray DataFrame.
dask_df = ds.to_dask()