ray.data.Dataset.zip#

Dataset.zip(other: Dataset[U]) Dataset[T, U][source]#

Zip this dataset with the elements of another.

The datasets must have identical num rows, block types, and block sizes, e.g. one was produced from a map() of another. For Arrow blocks, the schema will be concatenated, and any duplicate column names disambiguated with _1, _2, etc. suffixes.

Note

Zipped datasets are not lineage-serializable, i.e. they can not be used as a tunable hyperparameter in Ray Tune.

Time complexity: O(dataset size / parallelism)

Parameters

other – The dataset to zip with on the right hand side.

Examples

>>> import ray
>>> ds = ray.data.range(5)
>>> ds.zip(ds).take()
[(0, 0), (1, 1), (2, 2), (3, 3), (4, 4)]
Returns

A Dataset with (k, v) pairs (or concatenated Arrow schema) where k comes from the first dataset and v comes from the second.