ray.data.Dataset.write_bigquery#

Dataset.write_bigquery(project_id: str, dataset: str, max_retry_cnt: int = 10, ray_remote_args: Dict[str, Any] = None) None[source]#

Write the dataset to a BigQuery dataset table.

To control the number of parallel write tasks, use .repartition() before calling this method.

Note

This operation will trigger execution of the lazy transformations performed on this dataset.

Examples

import ray
import pandas as pd

docs = [{"title": "BigQuery Datasource test"} for key in range(4)]
ds = ray.data.from_pandas(pd.DataFrame(docs))
ds.write_bigquery(
    project_id="my_project_id",
    dataset="my_dataset_table",
)
Parameters:
  • project_id – The name of the associated Google Cloud Project that hosts the dataset to read. For more information, see details in Creating and managing projects.

  • dataset – The name of the dataset in the format of dataset_id.table_id. The dataset is created if it doesn’t already exist. The table_id is overwritten if it exists.

  • max_retry_cnt – The maximum number of retries that an individual block write is retried due to BigQuery rate limiting errors. This isn’t related to Ray fault tolerance retries. The default number of retries is 10.

  • ray_remote_args – Kwargs passed to ray.remote in the write tasks.