Input/Output#
Synthetic Data#
Creates a |
|
Creates a |
Python Objects#
Create a |
Parquet#
Creates a |
|
Create |
|
Writes the |
CSV#
Creates a |
|
Writes the |
JSON#
Creates a |
|
Writes the |
Text#
Create a |
Images#
Creates a |
|
Writes the |
Binary#
Create a |
TFRecords#
Create a |
|
Write the |
Pandas#
Create a |
|
Create a |
|
Convert this |
|
Converts this |
NumPy#
Create an Arrow dataset from numpy files. |
|
Creates a |
|
Creates a |
|
Writes a column of the |
|
Converts this |
Arrow#
Create a |
|
Create a |
|
Convert this |
MongoDB#
Create a |
|
Writes the |
BigQuery#
|
Create a dataset from BigQuery. |
|
Write the dataset to a BigQuery dataset table. |
SQL Databases#
Read from a database that provides a Python DB API2-compliant connector. |
|
Write to a database that provides a Python DB API2-compliant connector. |
Databricks#
Read a Databricks unity catalog table or Databricks SQL execution result. |
Dask#
Create a |
|
Convert this |
Spark#
Create a |
|
Convert this |
Modin#
Create a |
|
Convert this |
Mars#
Create a |
|
Convert this |
Torch#
Create a |
Hugging Face#
Create a |
TensorFlow#
Create a |
WebDataset#
Create a |
Datasource API#
Read a stream from a custom |
|
Interface for defining a custom |
|
A function used to read blocks from the |
|
A bound read operation for a |
|
Generates filenames when you write a |
Datasink API#
Writes the dataset to a custom |
|
Interface for defining write-related logic. |
|
DeveloperAPI: This API may change across minor Ray releases. |
|
DeveloperAPI: This API may change across minor Ray releases. |
Partitioning API#
Partition scheme used to describe path-based partitions. |
|
Supported dataset partition styles. |
|
Partition parser for path-based partition formats. |
|
Partition filter for path-based partition formats. |
MetadataProvider API#
Abstract callable that provides metadata for the files of a single dataset block. |
|
Abstract callable that provides metadata for |
|
Abstract callable that provides block metadata for Arrow Parquet file fragments. |
|
Default metadata provider for |
|
The default file metadata provider for ParquetDatasource. |
|
Fast Metadata provider for |
BlockWritePathProvider API#
Abstract callable that provides concrete output paths when writing dataset blocks. |
|
Default block write path provider implementation that writes each dataset block out to a file of the form: {base_path}/{dataset_uuid}_{task_index}_{block_index}.{file_format} |