ray.data.datasource.RowBasedFileDatasink#
- class ray.data.datasource.RowBasedFileDatasink(path: str, *, filesystem: pyarrow.fs.FileSystem | None = None, try_create_dir: bool = True, open_stream_args: Dict[str, Any] | None = None, filename_provider: FilenameProvider | None = None, dataset_uuid: str | None = None, file_format: str | None = None)[source]#
Bases:
_FileDatasink
A datasink that writes one row to each file.
Subclasses must implement
write_row_to_file
and call the superclass constructor.Examples
import io from typing import Any, Dict import pyarrow from PIL import Image from ray.data.datasource import RowBasedFileDatasink class ImageDatasink(RowBasedFileDatasink): def __init__(self, path: str, *, column: str, file_format: str = "png"): super().__init__(path, file_format=file_format) self._file_format = file_format self._column = column def write_row_to_file(self, row: Dict[str, Any], file: "pyarrow.NativeFile"): image = Image.fromarray(row[self._column]) buffer = io.BytesIO() image.save(buffer, format=self._file_format) file.write(buffer.getvalue())
DeveloperAPI: This API may change across minor Ray releases.
Methods
Initialize this datasink.
Return a human-readable name for this datasink.
Callback for when a write job fails.
Create a directory to write files to.
Write a row to a file.
Attributes
The target number of rows to pass to each
write()
call.If
False
, only launch write tasks on the driver's node.