class ray.data.Datasource[source]#

Bases: object

Interface for defining a custom Dataset datasource.

To read a datasource into a dataset, use read_datasource(). To write to a writable datasource, use write_datasource().

See RangeDatasource and DummyOutputDatasource for examples of how to implement readable and writable datasources.

For an example of subclassing Datasource, read Implementing a Custom Datasource.


Datasource instances must be serializable, since create_reader() and write() are called in remote tasks.

PublicAPI: This API is stable across Ray releases.




Return a Reader for the given read arguments.

do_write(blocks, metadata, ray_remote_args, ...)

Launch Ray tasks for writing blocks out to the datasource.


Return a human-readable name for this datasource.

on_write_complete(write_results, **kwargs)

Callback for when a write job completes.

on_write_failed(write_results, error, **kwargs)

Callback for when a write job fails.


Callback for when a write job starts.

prepare_read(parallelism, **read_args)

Deprecated: Please implement create_reader() instead.

write(blocks, ctx, **write_args)

Write blocks out to the datasource.