ray.data.datasource.BlockBasedFileDatasink#
- class ray.data.datasource.BlockBasedFileDatasink(path, *, num_rows_per_file: int | None = None, **file_datasink_kwargs)[source]#
Bases:
_FileDatasink
A datasink that writes multiple rows to each file.
Subclasses must implement
write_block_to_file
and call the superclass constructor.Examples
class CSVDatasink(BlockBasedFileDatasink): def __init__(self, path: str): super().__init__(path, file_format="csv") def write_block_to_file(self, block: BlockAccessor, file: "pyarrow.NativeFile"): from pyarrow import csv csv.write_csv(block.to_arrow(), file)
DeveloperAPI: This API may change across minor Ray releases.
Methods
Return a human-readable name for this datasink.
Callback for when a write job fails.
Create a directory to write files to.
Write a block of data to a file.
Attributes
The target number of rows to pass to each
write()
call.If
False
, only launch write tasks on the driver's node.