- Dataset.write_csv(path: str, *, filesystem: pyarrow.fs.FileSystem | None = None, try_create_dir: bool = True, arrow_open_stream_args: ~typing.Dict[str, ~typing.Any] | None = None, filename_provider: ~ray.data.datasource.filename_provider.FilenameProvider | None = None, block_path_provider: ~ray.data.datasource.block_path_provider.BlockWritePathProvider | None = None, arrow_csv_args_fn: ~typing.Callable[, ~typing.Dict[str, ~typing.Any]] = <function Dataset.<lambda>>, ray_remote_args: ~typing.Dict[str, ~typing.Any] = None, **arrow_csv_args) None #
Datasetto CSV files.
The number of files is determined by the number of blocks in the dataset. To control the number of number of blocks, call
This method is only supported for datasets with records that are convertible to pyarrow tables.
By default, the format of the output files is
uuidis a unique id for the dataset. To modify this behavior, implement a custom
BlockWritePathProviderand pass it in as the
This operation will trigger execution of the lazy transformations performed on this dataset.
Write the dataset as CSV files to a local directory.
>>> import ray >>> ds = ray.data.range(100) >>> ds.write_csv("local:///tmp/data")
Write the dataset as CSV files to S3.
>>> import ray >>> ds = ray.data.range(100) >>> ds.write_csv("s3://bucket/folder/)
Time complexity: O(dataset size / parallelism)
path – The path to the destination root directory, where the CSV files are written to.
filesystem – The pyarrow filesystem implementation to write to. These filesystems are specified in the pyarrow docs. Specify this if you need to provide specific configurations to the filesystem. By default, the filesystem is automatically selected based on the scheme of the paths. For example, if the path begins with
try_create_dir – If
True, attempts to create all directories in the destination path if
True. Does nothing if all directories already exist. Defaults to
arrow_open_stream_args – kwargs passed to pyarrow.fs.FileSystem.open_output_stream, which is used when opening the file to write to.
filename_provider – A
FilenameProviderimplementation. Use this parameter to customize what your filenames look like.
arrow_csv_args_fn – Callable that returns a dictionary of write arguments that are provided to pyarrow.write.write_csv when writing each block to a file. Overrides any duplicate keys from
arrow_csv_args. Use this argument instead of
arrow_csv_argsif any of your write arguments cannot be pickled, or if you’d like to lazily resolve the write arguments for each dataset block.
ray_remote_args – kwargs passed to
remote()in the write tasks.
Options to pass to pyarrow.write.write_csv when writing each block to a file.