Dataset.zip(other: ray.data.dataset.Dataset) ray.data.dataset.Dataset[source]#

Materialize and zip this dataset with the elements of another.

The datasets must have the same number of rows. Their column sets will be merged, and any duplicate column names disambiguated with _1, _2, etc. suffixes.


The smaller of the two datasets will be repartitioned to align the number of rows per block with the larger dataset.


Zipped datasets are not lineage-serializable, i.e. they can not be used as a tunable hyperparameter in Ray Tune.


>>> import ray
>>> ds1 = ray.data.range(5)
>>> ds2 = ray.data.range(5)
>>> ds1.zip(ds2).take_batch()
{'id': array([0, 1, 2, 3, 4]), 'id_1': array([0, 1, 2, 3, 4])}

Time complexity: O(dataset size / parallelism)


other – The dataset to zip with on the right hand side.


A Dataset containing the columns of the second dataset concatenated horizontally with the columns of the first dataset, with duplicate column names disambiguated with _1, _2, etc. suffixes.