ray.data.datasource.BlockBasedFileDatasink#

class ray.data.datasource.BlockBasedFileDatasink(path, *, num_rows_per_file: int | None = None, **file_datasink_kwargs)[source]#

Bases: _FileDatasink

A datasink that writes multiple rows to each file.

Subclasses must implement write_block_to_file and call the superclass constructor.

Examples

class CSVDatasink(BlockBasedFileDatasink):
    def __init__(self, path: str):
        super().__init__(path, file_format="csv")

    def write_block_to_file(self, block: BlockAccessor, file: "pyarrow.NativeFile"):
        from pyarrow import csv
        csv.write_csv(block.to_arrow(), file)

DeveloperAPI: This API may change across minor Ray releases.

Methods

get_name

Return a human-readable name for this datasink.

on_write_failed

Callback for when a write job fails.

on_write_start

Create a directory to write files to.

write_block_to_file

Write a block of data to a file.

Attributes

num_rows_per_write

The target number of rows to pass to each write() call.

supports_distributed_writes

If False, only launch write tasks on the driver's node.