ray.data.datasource.FilenameProvider#

class ray.data.datasource.FilenameProvider(dataset_uuid: str | None = None, file_format: str | None = None)[source]#

Generates filenames when you write a Dataset.

Use this class to customize the filenames used when writing a Dataset.

Override get_filename_for_task() to customize filenames. For row-based writes (e.g., write_images()), row filenames are automatically derived by appending _{block_index:06}_{row_index:06} to the task filename.

Example

This snippet shows you how to customize filenames with a prefix. For example, a file might be named images_abc123_000000.png.

import ray
from ray.data.datasource import FilenameProvider

class ImageFilenameProvider(FilenameProvider):

    def __init__(self, prefix: str, file_format: str):
        super().__init__(file_format=file_format)
        self.prefix = prefix

    def get_filename_for_task(self, write_uuid, task_index):
        return f"{self.prefix}_{write_uuid}_{task_index:06}.{self.file_format}"

ds = ray.data.read_parquet("s3://anonymous@ray-example-data/images.parquet")
ds.write_images(
    "/tmp/results",
    column="image",
    filename_provider=ImageFilenameProvider("images", "png")
)

PublicAPI (alpha): This API is in alpha and may change before becoming stable.

Methods

__init__

Create a FilenameProvider.

get_filename_for_block

Generate a filename for a block of data.

get_filename_for_row

Generate a filename for a row.

get_filename_for_task

Generate a filename for a write task.