ray.data.datasource.FilenameProvider#
- class ray.data.datasource.FilenameProvider(dataset_uuid: str | None = None, file_format: str | None = None)[source]#
Generates filenames when you write a
Dataset.Use this class to customize the filenames used when writing a Dataset.
Override
get_filename_for_task()to customize filenames. For row-based writes (e.g.,write_images()), row filenames are automatically derived by appending_{block_index:06}_{row_index:06}to the task filename.Example
This snippet shows you how to customize filenames with a prefix. For example, a file might be named
images_abc123_000000.png.import ray from ray.data.datasource import FilenameProvider class ImageFilenameProvider(FilenameProvider): def __init__(self, prefix: str, file_format: str): super().__init__(file_format=file_format) self.prefix = prefix def get_filename_for_task(self, write_uuid, task_index): return f"{self.prefix}_{write_uuid}_{task_index:06}.{self.file_format}" ds = ray.data.read_parquet("s3://anonymous@ray-example-data/images.parquet") ds.write_images( "/tmp/results", column="image", filename_provider=ImageFilenameProvider("images", "png") )
PublicAPI (alpha): This API is in alpha and may change before becoming stable.
Methods
Create a FilenameProvider.
Generate a filename for a block of data.
Generate a filename for a row.
Generate a filename for a write task.