ray.data.ReadTask#

class ray.data.ReadTask(read_fn: Callable[[], Iterable[pyarrow.Table | pandas.DataFrame]], metadata: BlockMetadata)[source]#

Bases: Callable[[], Iterable[Union[pyarrow.Table, pandas.DataFrame]]]

A function used to read blocks from the Dataset.

Read tasks are generated by get_read_tasks(), and return a list of ray.data.Block when called. Initial metadata about the read operation can be retrieved via get_metadata() prior to executing the read. Final metadata is returned after the read along with the blocks.

Ray will execute read tasks in remote functions to parallelize execution. Note that the number of blocks returned can vary at runtime. For example, if a task is reading a single large file it can return multiple blocks to avoid running out of memory during the read.

The initial metadata should reflect all the blocks returned by the read, e.g., if the metadata says num_rows=1000, the read can return a single block of 1000 rows, or multiple blocks with 1000 rows altogether.

The final metadata (returned with the actual block) reflects the exact contents of the block itself.

DeveloperAPI: This API may change across minor Ray releases.

Methods