ray.data.read_datasource
ray.data.read_datasource#
- ray.data.read_datasource(datasource: ray.data.datasource.datasource.Datasource, *, parallelism: int = - 1, ray_remote_args: Dict[str, Any] = None, **read_args) ray.data.dataset.Dataset [source]#
Read a stream from a custom
Datasource
.- Parameters
datasource – The
Datasource
to read data from.parallelism – The requested parallelism of the read. Parallelism might be limited by the available partitioning of the datasource. If set to -1, parallelism is automatically chosen based on the available cluster resources and estimated in-memory data size.
read_args – Additional kwargs to pass to the
Datasource
implementation.ray_remote_args – kwargs passed to
ray.remote()
in the read tasks.
- Returns
Dataset
that reads data from theDatasource
.