ray.data.datasource.RowBasedFileDatasink.__init__#

RowBasedFileDatasink.__init__(path: str, *, filesystem: pyarrow.fs.FileSystem | None = None, try_create_dir: bool = True, open_stream_args: Dict[str, Any] | None = None, filename_provider: FilenameProvider | None = None, block_path_provider: BlockWritePathProvider | None = None, dataset_uuid: str | None = None, file_format: str | None = None)#

Initialize this datasink.

Parameters:
  • path – The folder to write files to.

  • filesystem – The filesystem to write files to. If not provided, the filesystem is inferred from the path.

  • try_create_dir – Whether to create the directory to write files to.

  • open_stream_args – Arguments to pass to filesystem.open_output_stream.

  • filename_provider – A ray.data.datasource.FilenameProvider that generates filenames for each row or block.

  • dataset_uuid – The UUID of the dataset being written. If specified, it’s included in the filename.

  • file_format – The file extension. If specified, files are written with this extension.