ray.data.datasource.FilenameProvider#
- class ray.data.datasource.FilenameProvider[source]#
Generates filenames when you write a
Dataset
.Use this class to customize the filenames used when writing a Dataset.
Some methods write each row to a separate file, while others write each block to a separate file. For example,
ray.data.Dataset.write_images()
writes individual rows, andray.data.Dataset.write_parquet()
writes blocks of data. For more information about blocks, see Data internals.If you’re writing each row to a separate file, implement
get_filename_for_row()
. Otherwise, implementget_filename_for_block()
.Example
This snippet shows you how to encode labels in written files. For example, if
"cat"
is a label, you might write a file namedcat_000000_000000_000000.png
.import ray from ray.data.datasource import FilenameProvider class ImageFilenameProvider(FilenameProvider): def __init__(self, file_format: str): self.file_format = file_format def get_filename_for_row(self, row, task_index, block_index, row_index): return ( f"{row['label']}_{task_index:06}_{block_index:06}" f"_{row_index:06}.{self.file_format}" ) ds = ray.data.read_parquet("s3://anonymous@ray-example-data/images.parquet") ds.write_images( "/tmp/results", column="image", filename_provider=ImageFilenameProvider("png") )
PublicAPI (alpha): This API is in alpha and may change before becoming stable.
Methods
Generate a filename for a block of data.
Generate a filename for a row.