ray.data.Dataset.zip#

Dataset.zip(other: ray.data.dataset.Dataset) ray.data.dataset.Dataset[source]#

Materialize and zip this dataset with the elements of another.

The datasets must have the same number of rows. Their column sets will be merged, and any duplicate column names disambiguated with _1, _2, etc. suffixes.

Note

The smaller of the two datasets will be repartitioned to align the number of rows per block with the larger dataset.

Note

Zipped datasets are not lineage-serializable, i.e. they can not be used as a tunable hyperparameter in Ray Tune.

Examples

>>> import ray
>>> ds1 = ray.data.range(5)
>>> ds2 = ray.data.range(5)
>>> ds1.zip(ds2).take_batch()
{'id': array([0, 1, 2, 3, 4]), 'id_1': array([0, 1, 2, 3, 4])}

Time complexity: O(dataset size / parallelism)

Parameters

other – The dataset to zip with on the right hand side.

Returns

A Dataset containing the columns of the second dataset concatenated horizontally with the columns of the first dataset, with duplicate column names disambiguated with _1, _2, etc. suffixes.