Input/Output
Contents
Input/Output#
Synthetic Data#
|
Create a dataset from a range of integers [0..n). |
|
Create a tabular dataset from a range of integers [0..n). |
|
Create a Tensor dataset from a range of integers [0..n). |
Python Objects#
|
Create a dataset from a list of local Python objects. |
Parquet#
|
Create an Arrow dataset from parquet files. |
|
Create an Arrow dataset from a large number (such as >1K) of parquet files quickly. |
|
Write the dataset to parquet. |
CSV#
|
Create an Arrow dataset from csv files. |
|
Write the dataset to csv. |
JSON#
|
Create an Arrow dataset from json files. |
|
Write the dataset to json. |
Text#
|
Create a dataset from lines stored in text files. |
Images#
|
Read images from the specified paths. |
Binary#
|
Create a dataset from binary files of arbitrary contents. |
TFRecords#
|
Create a dataset from TFRecord files that contain tf.train.Example messages. |
|
Write the dataset to TFRecord files. |
Pandas#
|
Create a dataset from a list of Pandas dataframes. |
|
Create a dataset from a list of Ray object references to Pandas dataframes. |
|
Convert this dataset into a single Pandas DataFrame. |
Convert this dataset into a distributed set of Pandas dataframes. |
NumPy#
|
Create an Arrow dataset from numpy files. |
|
Create a dataset from a list of NumPy ndarrays. |
|
Create a dataset from a list of NumPy ndarray futures. |
|
Write a tensor column of the dataset to npy files. |
|
Convert this dataset into a distributed set of NumPy ndarrays. |
Arrow#
|
Create a dataset from a list of Arrow tables. |
|
Create a dataset from a set of Arrow tables. |
Convert this dataset into a distributed set of Arrow tables. |
MongoDB#
|
Create an Arrow dataset from MongoDB. |
|
Write the dataset to a MongoDB datasource. |
SQL Databases#
|
Read from a database that provides a Python DB API2-compliant connector. |
Dask#
|
Create a dataset from a Dask DataFrame. |
|
Convert this dataset into a Dask DataFrame. |
Spark#
|
Create a dataset from a Spark dataframe. |
|
Convert this dataset into a Spark dataframe. |
Modin#
|
Create a dataset from a Modin dataframe. |
Convert this dataset into a Modin dataframe. |
Mars#
|
Create a dataset from a MARS dataframe. |
Convert this dataset into a MARS dataframe. |
Torch#
|
Create a dataset from a Torch dataset. |
Hugging Face#
|
Create a dataset from a Hugging Face Datasets Dataset. |
TensorFlow#
|
Create a dataset from a TensorFlow dataset. |
WebDataset#
|
Create a dataset from WebDataset files. |
Datasource API#
|
Read a dataset from a custom data source. |
|
Write the dataset to a custom datasource. |
|
Interface for defining a custom |
|
A function used to read blocks from the dataset. |
|
A bound read operation for a datasource. |
Built-in Datasources#
|
Binary datasource, for reading and writing binary files. |
|
CSV datasource, for reading and writing CSV files. |
|
File-based datasource, for reading and writing files. |
|
A datasource that lets you read images. |
|
JSON datasource, for reading and writing JSON files. |
|
Numpy datasource, for reading and writing Numpy files. |
|
Parquet datasource, for reading and writing Parquet files. |
|
An example datasource that generates ranges of numbers from [0..n). |
|
TFRecord datasource, for reading and writing TFRecord files. |
|
Datasource for reading from and writing to MongoDB. |
|
A Datasource for WebDataset datasets (tar format with naming conventions). |
Partitioning API#
|
Partition scheme used to describe path-based partitions. |
|
Supported dataset partition styles. |
|
Callable that generates directory path strings for path-based partition formats. |
|
Partition parser for path-based partition formats. |
Partition filter for path-based partition formats. |
MetadataProvider API#
Abstract callable that provides metadata for the files of a single dataset block. |
|
Abstract callable that provides metadata for FileBasedDatasource |
|
Abstract callable that provides block metadata for Arrow Parquet file fragments. |
|
Default metadata provider for FileBasedDatasource implementations that reuse the base |
|
The default file metadata provider for ParquetDatasource. |
|
Fast Metadata provider for FileBasedDatasource implementations. |