ray.data.datasource.Reader
ray.data.datasource.Reader#
- class ray.data.datasource.Reader(*args, **kwds)[source]#
Bases:
Generic
[ray.data.block.T
]A bound read operation for a datasource.
This is a stateful class so that reads can be prepared in multiple stages. For example, it is useful for Datasets to know the in-memory size of the read prior to executing it.
PublicAPI: This API is stable across Ray releases.
- estimate_inmemory_data_size() Optional[int] [source]#
Return an estimate of the in-memory data size, or None if unknown.
Note that the in-memory data size may be larger than the on-disk data size.
- get_read_tasks(parallelism: int) List[ray.data.datasource.datasource.ReadTask[ray.data.block.T]] [source]#
Execute the read and return read tasks.
- Parameters
parallelism – The requested read parallelism. The number of read tasks should equal to this value if possible.
read_args – Additional kwargs to pass to the datasource impl.
- Returns
A list of read tasks that can be executed to read blocks from the datasource in parallel.