ray.data.Dataset.write_datasource#

Dataset.write_datasource(datasource: ray.data.datasource.datasource.Datasource[ray.data.block.T], *, ray_remote_args: Optional[Dict[str, Any]] = None, **write_args) None[source]#

Write the dataset to a custom datasource.

Examples

>>> import ray
>>> from ray.data.datasource import Datasource
>>> ds = ray.data.range(100) 
>>> class CustomDatasource(Datasource): 
...     # define custom data source
...     pass 
>>> ds.write_datasource(CustomDatasource(...)) 

Time complexity: O(dataset size / parallelism)

Parameters
  • datasource – The datasource to write to.

  • ray_remote_args – Kwargs passed to ray.remote in the write tasks.

  • write_args – Additional write args to pass to the datasource.