ray.data.Dataset.zip#
- Dataset.zip(*other: List[Dataset]) Dataset [source]#
Zip the columns of this dataset with the columns of another.
The datasets must have the same number of rows. Their column sets are merged, and any duplicate column names are disambiguated with suffixes like
"_1"
.Note
The smaller of the two datasets is repartitioned to align the number of rows per block with the larger dataset.
Note
Zipped datasets aren’t lineage-serializable. As a result, they can’t be used as a tunable hyperparameter in Ray Tune.
Examples
>>> import ray >>> ds1 = ray.data.range(5) >>> ds2 = ray.data.range(5) >>> ds3 = ray.data.range(5) >>> ds1.zip(ds2, ds3).take_batch() {'id': array([0, 1, 2, 3, 4]), 'id_1': array([0, 1, 2, 3, 4]), 'id_2': array([0, 1, 2, 3, 4])}
- Parameters:
*other – List of datasets to combine with this one. The datasets must have the same row count as this dataset, otherwise the ValueError is raised.
- Returns:
A
Dataset
containing the columns of the second dataset concatenated horizontally with the columns of the first dataset, with duplicate column names disambiguated with suffixes like"_1"
.- Raises:
ValueError – If the datasets have different row counts.