ray.data.Dataset.write_sql#
- Dataset.write_sql(sql: str, connection_factory: Callable[[], Any], ray_remote_args: Dict[str, Any] | None = None, concurrency: int | None = None) None [source]#
Write to a database that provides a Python DB API2-compliant connector.
Note
This method writes data in parallel using the DB API2
executemany
method. To learn more about this method, see PEP 249.Note
This operation will trigger execution of the lazy transformations performed on this dataset.
Examples
import sqlite3 import ray connection = sqlite3.connect("example.db") connection.cursor().execute("CREATE TABLE movie(title, year, score)") dataset = ray.data.from_items([ {"title": "Monty Python and the Holy Grail", "year": 1975, "score": 8.2}, {"title": "And Now for Something Completely Different", "year": 1971, "score": 7.5} ]) dataset.write_sql( "INSERT INTO movie VALUES(?, ?, ?)", lambda: sqlite3.connect("example.db") ) result = connection.cursor().execute("SELECT * FROM movie ORDER BY year") print(result.fetchall())
[('And Now for Something Completely Different', 1971, 7.5), ('Monty Python and the Holy Grail', 1975, 8.2)]
- Parameters:
sql – An
INSERT INTO
statement that specifies the table to write to. The number of parameters must match the number of columns in the table.connection_factory – A function that takes no arguments and returns a Python DB API2 Connection object.
ray_remote_args – Keyword arguments passed to
remote()
in the write tasks.concurrency – The maximum number of Ray tasks to run concurrently. Set this to control number of tasks to run concurrently. This doesn’t change the total number of tasks run. By default, concurrency is dynamically decided based on the available resources.