ray.data.datasource.FilenameProvider#
- class ray.data.datasource.FilenameProvider[source]#
Generates filenames when you write a
Dataset.Use this class to customize the filenames used when writing a Dataset.
Some methods write each row to a separate file, while others write each block to a separate file. For example,
ray.data.Dataset.write_images()writes individual rows, andray.data.Dataset.write_parquet()writes blocks of data. For more information about blocks, see Data internals.If you’re writing each row to a separate file, implement
get_filename_for_row(). Otherwise, implementget_filename_for_block().Example
This snippet shows you how to encode labels in written files. For example, if
"cat"is a label, you might write a file namedcat_000000_000000_000000.png.import ray from ray.data.datasource import FilenameProvider class ImageFilenameProvider(FilenameProvider): def __init__(self, file_format: str): self.file_format = file_format def get_filename_for_row(self, row, write_uuid, task_index, block_index, row_index): return ( f"{row['label']}_{write_uuid}_{task_index:06}_{block_index:06}" f"_{row_index:06}.{self.file_format}" ) ds = ray.data.read_parquet("s3://anonymous@ray-example-data/images.parquet") ds.write_images( "/tmp/results", column="image", filename_provider=ImageFilenameProvider("png") )
PublicAPI (alpha): This API is in alpha and may change before becoming stable.
Methods
Generate a filename for a block of data.
Generate a filename for a row.