ray.data.datasource.FilenameProvider#

class ray.data.datasource.FilenameProvider[source]#

Generates filenames when you write a Dataset.

Use this class to customize the filenames used when writing a Dataset.

Some methods write each row to a separate file, while others write each block to a separate file. For example, ray.data.Dataset.write_images() writes individual rows, and ray.data.Dataset.write_parquet() writes blocks of data. For more information about blocks, see Data internals.

If you’re writing each row to a separate file, implement get_filename_for_row(). Otherwise, implement get_filename_for_block().

Example

This snippet shows you how to encode labels in written files. For example, if "cat" is a label, you might write a file named cat_000000_000000_000000.png.

import ray
from ray.data.datasource import FilenameProvider

class ImageFilenameProvider(FilenameProvider):

    def __init__(self, file_format: str):
        self.file_format = file_format

    def get_filename_for_row(self, row, task_index, block_index, row_index):
        return (
            f"{row['label']}_{task_index:06}_{block_index:06}"
            f"_{row_index:06}.{self.file_format}"
        )

ds = ray.data.read_parquet("s3://anonymous@ray-example-data/images.parquet")
ds.write_images(
    "/tmp/results",
    column="image",
    filename_provider=ImageFilenameProvider("png")
)

PublicAPI (alpha): This API is in alpha and may change before becoming stable.

Methods

__init__

get_filename_for_block

Generate a filename for a block of data.

get_filename_for_row

Generate a filename for a row.