ray.data.Datasource#

class ray.data.Datasource[source]#

Bases: object

Interface for defining a custom ray.data.Dataset datasource.

To read a datasource into a dataset, use ray.data.read_datasource(). To write to a writable datasource, use Dataset.write_datasource().

See RangeDatasource and DummyOutputDatasource for examples of how to implement readable and writable datasources.

Datasource instances must be serializable, since create_reader() and write() are called in remote tasks.

For an example of subclassing Datasource, read Implementing a Custom Datasource.

PublicAPI: This API is stable across Ray releases.

Methods

__init__()

create_reader(**read_args)

Return a Reader for the given read arguments.

do_write(blocks, metadata, ray_remote_args, ...)

Launch Ray tasks for writing blocks out to the datasource.

get_name()

Return a human-readable name for this datasource.

on_write_complete(write_results, **kwargs)

Callback for when a write job completes.

on_write_failed(write_results, error, **kwargs)

Callback for when a write job fails.

prepare_read(parallelism, **read_args)

Deprecated: Please implement create_reader() instead.

write(blocks, **write_args)

Write blocks out to the datasource.