ray.data.Datasource
ray.data.Datasource#
- class ray.data.Datasource(*args, **kwds)[source]#
Bases:
Generic
[ray.data.block.T
]Interface for defining a custom
ray.data.Dataset
datasource.To read a datasource into a dataset, use
ray.data.read_datasource()
. To write to a writable datasource, useDataset.write_datasource()
.See
RangeDatasource
andDummyOutputDatasource
for examples of how to implement readable and writable datasources.Datasource instances must be serializable, since
create_reader()
anddo_write()
are called in remote tasks.PublicAPI: This API is stable across Ray releases.
- create_reader(**read_args) ray.data.datasource.datasource.Reader[ray.data.block.T] [source]#
Return a Reader for the given read arguments.
The reader object will be responsible for querying the read metadata, and generating the actual read tasks to retrieve the data blocks upon request.
- Parameters
read_args – Additional kwargs to pass to the datasource impl.
- prepare_read(parallelism: int, **read_args) List[ray.data.datasource.datasource.ReadTask[ray.data.block.T]] [source]#
Deprecated: Please implement create_reader() instead.
Warning
DEPRECATED: This API is deprecated and may be removed in future Ray releases.
- do_write(blocks: List[ray.types.ObjectRef[Union[List[ray.data.block.T], pyarrow.Table, pandas.DataFrame, bytes]]], metadata: List[ray.data.block.BlockMetadata], ray_remote_args: Dict[str, Any], **write_args) List[ray.types.ObjectRef[Any]] [source]#
Launch Ray tasks for writing blocks out to the datasource.
- Parameters
blocks – List of data block references. It is recommended that one write task be generated per block.
metadata – List of block metadata.
ray_remote_args – Kwargs passed to ray.remote in the write tasks.
write_args – Additional kwargs to pass to the datasource impl.
- Returns
A list of the output of the write tasks.
- on_write_complete(write_results: List[Any], **kwargs) None [source]#
Callback for when a write job completes.
This can be used to “commit” a write output. This method must succeed prior to
write_datasource()
returning to the user. If this method fails, thenon_write_failed()
will be called.- Parameters
write_results – The list of the write task results.
kwargs – Forward-compatibility placeholder.
- on_write_failed(write_results: List[ray.types.ObjectRef[Any]], error: Exception, **kwargs) None [source]#
Callback for when a write job fails.
This is called on a best-effort basis on write failures.
- Parameters
write_results – The list of the write task result futures.
error – The first error encountered.
kwargs – Forward-compatibility placeholder.