ray.data.datasource.FileBasedDatasource#

class ray.data.datasource.FileBasedDatasource(*args, **kwds)[source]#

File-based datasource, for reading and writing files.

This class should not be used directly, and should instead be subclassed and tailored to particular file formats. Classes deriving from this class must implement _read_file().

If the _FILE_EXTENSION is defined, per default only files with this extension will be read. If None, no default filter is used.

Current subclasses:

JSONDatasource, CSVDatasource, NumpyDatasource, BinaryDatasource

DeveloperAPI: This API may change across minor Ray releases.

__init__()#

Methods

__init__()

create_reader(**kwargs)

Return a Reader for the given read arguments.

do_write(blocks, metadata, path, dataset_uuid)

Creates and returns write tasks for a file-based datasource.

file_extension_filter()

on_write_complete(write_results, **kwargs)

Callback for when a write job completes.

on_write_failed(write_results, error, **kwargs)

Callback for when a write job fails.

prepare_read(parallelism, **read_args)

Deprecated: Please implement create_reader() instead.