ray.data.datasource.FilenameProvider.get_filename_for_row#

FilenameProvider.get_filename_for_row(row: Dict[str, Any], write_uuid: str, task_index: int, block_index: int, row_index: int) → str[source]#

Generate a filename for a row.

Note

Filenames must be unique and deterministic for a given write UUID, and task, block, and row index.

A block consists of multiple rows, and each row corresponds to a single output file. Each task might produce a different number of blocks, and each block might contain a different number of rows.

Tip

If you require a contiguous row index into the global dataset, use iter_rows(). This method is single-threaded and isn’t recommended for large datasets.

Parameters:

row – The row that will be written to a file.
write_uuid – The UUID of the write operation.
task_index – The index of the write task.
block_index – The index of the block within the write task.
row_index – The index of the row within the block.