ray.data.Datasource#

class ray.data.Datasource(*args, **kwds)[source]#

Interface for defining a custom ray.data.Dataset datasource.

To read a datasource into a dataset, use ray.data.read_datasource(). To write to a writable datasource, use Dataset.write_datasource().

See RangeDatasource and DummyOutputDatasource for examples of how to implement readable and writable datasources.

Datasource instances must be serializable, since create_reader() and do_write() are called in remote tasks.

PublicAPI: This API is stable across Ray releases.

__init__()#

Methods

__init__()

create_reader(**read_args)

Return a Reader for the given read arguments.

do_write(blocks, metadata, ray_remote_args, ...)

Launch Ray tasks for writing blocks out to the datasource.

on_write_complete(write_results, **kwargs)

Callback for when a write job completes.

on_write_failed(write_results, error, **kwargs)

Callback for when a write job fails.

prepare_read(parallelism, **read_args)

Deprecated: Please implement create_reader() instead.