class ray.data.datasource.MongoDatasource(*args, **kwds)[source]#

Bases: ray.data.datasource.datasource.Datasource

Datasource for reading from and writing to MongoDB.


>>> import ray
>>> from ray.data.datasource import MongoDatasource
>>> from pymongoarrow.api import Schema 
>>> ds = ray.data.read_datasource( 
...     MongoDatasource(), 
...     uri="mongodb://username:[email protected]:27017/?authSource=admin", # noqa: E501 
...     database="my_db", 
...     collection="my_collection", 
...     schema=Schema({"col1": pa.string(), "col2": pa.int64()}), 
... ) 

PublicAPI (alpha): This API is in alpha and may change before becoming stable.

create_reader(**kwargs) ray.data.datasource.datasource.Reader[source]#

Return a Reader for the given read arguments.

The reader object will be responsible for querying the read metadata, and generating the actual read tasks to retrieve the data blocks upon request.


read_args – Additional kwargs to pass to the datasource impl.

do_write(blocks: List[ray.types.ObjectRef[Union[List[ray.data.block.T], pyarrow.Table, pandas.DataFrame, bytes]]], metadata: List[ray.data.block.BlockMetadata], ray_remote_args: Optional[Dict[str, Any]], uri: str, database: str, collection: str) List[ray.types.ObjectRef[Any]][source]#

Launch Ray tasks for writing blocks out to the datasource.

  • blocks – List of data block references. It is recommended that one write task be generated per block.

  • metadata – List of block metadata.

  • ray_remote_args – Kwargs passed to ray.remote in the write tasks.

  • write_args – Additional kwargs to pass to the datasource impl.


A list of the output of the write tasks.