ray.data.Datasource.get_read_tasks#

Datasource.get_read_tasks(parallelism: int, per_task_row_limit: int | None = None) List[ReadTask][source]#

Execute the read and return read tasks.

Parameters:
  • parallelism – The requested read parallelism. The number of read tasks should equal to this value if possible.

  • per_task_row_limit – The per-task row limit for the read tasks.

Returns:

A list of read tasks that can be executed to read blocks from the datasource in parallel.