ray.data.ReadTask#
- class ray.data.ReadTask(read_fn: Callable[[], Iterable[pyarrow.Table | pandas.DataFrame]], metadata: BlockMetadata)[source]#
Bases:
Callable
[[],Iterable
[pyarrow.Table
|pandas.DataFrame
]]A function used to read blocks from the
Dataset
.Read tasks are generated by
get_read_tasks()
, and return a list ofray.data.Block
when called. Initial metadata about the read operation can be retrieved via themetadata
attribute prior to executing the read. Final metadata is returned after the read along with the blocks.Ray will execute read tasks in remote functions to parallelize execution. Note that the number of blocks returned can vary at runtime. For example, if a task is reading a single large file it can return multiple blocks to avoid running out of memory during the read.
The initial metadata should reflect all the blocks returned by the read, e.g., if the metadata says
num_rows=1000
, the read can return a single block of 1000 rows, or multiple blocks with 1000 rows altogether.The final metadata (returned with the actual block) reflects the exact contents of the block itself.
DeveloperAPI: This API may change across minor Ray releases.
Methods
Attributes