class ray.data.datasource.Reader(*args, **kwds)[source]#

Bases: Generic[ray.data.block.T]

A bound read operation for a datasource.

This is a stateful class so that reads can be prepared in multiple stages. For example, it is useful for Datasets to know the in-memory size of the read prior to executing it.

PublicAPI: This API is stable across Ray releases.

estimate_inmemory_data_size() Optional[int][source]#

Return an estimate of the in-memory data size, or None if unknown.

Note that the in-memory data size may be larger than the on-disk data size.

get_read_tasks(parallelism: int) List[ray.data.datasource.datasource.ReadTask[ray.data.block.T]][source]#

Execute the read and return read tasks.

  • parallelism – The requested read parallelism. The number of read tasks should equal to this value if possible.

  • read_args – Additional kwargs to pass to the datasource impl.


A list of read tasks that can be executed to read blocks from the datasource in parallel.