ray.data.Dataset.zip#

Dataset.zip(other: Dataset) Dataset[source]#

Materialize and zip the columns of this dataset with the columns of another.

The datasets must have the same number of rows. Their column sets are merged, and any duplicate column names are disambiguated with suffixes like "_1".

Note

The smaller of the two datasets is repartitioned to align the number of rows per block with the larger dataset.

Note

Zipped datasets aren’t lineage-serializable. As a result, they can’t be used as a tunable hyperparameter in Ray Tune.

Examples

>>> import ray
>>> ds1 = ray.data.range(5)
>>> ds2 = ray.data.range(5)
>>> ds1.zip(ds2).take_batch()
{'id': array([0, 1, 2, 3, 4]), 'id_1': array([0, 1, 2, 3, 4])}

Time complexity: O(dataset size / parallelism)

Parameters:

other – The dataset to zip with on the right hand side.

Returns:

A Dataset containing the columns of the second dataset concatenated horizontally with the columns of the first dataset, with duplicate column names disambiguated with suffixes like "_1".