- Dataset.zip(other: Dataset[U]) Dataset[T, U] #
Zip this dataset with the elements of another.
The datasets must have identical num rows, block types, and block sizes (e.g., one was produced from a
.map()of another). For Arrow blocks, the schema will be concatenated, and any duplicate column names disambiguated with _1, _2, etc. suffixes.
NOTE: Zipped datasets are not lineage-serializable, i.e. they can not be used as a tunable hyperparameter in Ray Tune.
Time complexity: O(dataset size / parallelism)
other – The dataset to zip with on the right hand side.
>>> import ray >>> ds = ray.data.range(5) >>> ds.zip(ds).take() [(0, 0), (1, 1), (2, 2), (3, 3), (4, 4)]
A Dataset with (k, v) pairs (or concatenated Arrow schema) where k comes from the first dataset and v comes from the second.