Saving Data API#

Parquet#

Dataset.write_parquet

Writes the Dataset to parquet files under the provided path.

CSV#

Dataset.write_csv

Writes the Dataset to CSV files.

JSON#

Dataset.write_json

Writes the Dataset to JSON and JSONL files.

Images#

Dataset.write_images

Writes the Dataset to images.

TFRecords#

Dataset.write_tfrecords

Write the Dataset to TFRecord files.

Pandas#

Dataset.to_pandas

Convert this Dataset to a single pandas DataFrame.

Dataset.to_pandas_refs

Converts this Dataset into a distributed set of Pandas dataframes.

NumPy#

Dataset.write_numpy

Writes a column of the Dataset to .npy files.

Dataset.to_numpy_refs

Converts this Dataset into a distributed set of NumPy ndarrays or dictionary of NumPy ndarrays.

Arrow#

Dataset.to_arrow_refs

Convert this Dataset into a distributed set of PyArrow tables.

MongoDB#

Dataset.write_mongo

Writes the Dataset to a MongoDB database.

BigQuery#

Dataset.write_bigquery

Write the dataset to a BigQuery dataset table.

SQL Databases#

Dataset.write_sql

Write to a database that provides a Python DB API2-compliant connector.

Snowflake#

Dataset.write_snowflake

Write this Dataset to a Snowflake table.

Iceberg#

Dataset.write_iceberg

Writes the Dataset to an Iceberg table.

Lance#

Dataset.write_lance

Write the dataset to a Lance dataset.

ClickHouse#

Dataset.write_clickhouse

Write the dataset to a ClickHouse dataset table.

Daft#

Dataset.to_daft

Convert this Dataset into a Daft DataFrame.

Dask#

Dataset.to_dask

Convert this Dataset into a Dask DataFrame.

Spark#

Dataset.to_spark

Convert this Dataset into a Spark DataFrame.

Modin#

Dataset.to_modin

Convert this Dataset into a Modin DataFrame.

Mars#

Dataset.to_mars

Convert this Dataset into a Mars DataFrame.

Datasink API#

Dataset.write_datasink

Writes the dataset to a custom Datasink.

Datasink

Interface for defining write-related logic.

datasource.RowBasedFileDatasink

A datasink that writes one row to each file.

datasource.BlockBasedFileDatasink

A datasink that writes multiple rows to each file.

datasource.FileBasedDatasource

File-based datasource for reading files.

datasource.WriteResult

Aggregated result of the Datasink write operations.

datasource.WriteReturnType

Type variable.