ray.data.Dataset.zip
ray.data.Dataset.zip#
- Dataset.zip(other: Dataset[U]) Dataset[T, U] [source]#
Zip this dataset with the elements of another.
The datasets must have identical num rows, block types, and block sizes, e.g. one was produced from a
map()
of another. For Arrow blocks, the schema will be concatenated, and any duplicate column names disambiguated with _1, _2, etc. suffixes.Note
Zipped datasets are not lineage-serializable, i.e. they can not be used as a tunable hyperparameter in Ray Tune.
Time complexity: O(dataset size / parallelism)
- Parameters
other – The dataset to zip with on the right hand side.
Examples
>>> import ray >>> ds = ray.data.range(5) >>> ds.zip(ds).take() [(0, 0), (1, 1), (2, 2), (3, 3), (4, 4)]
- Returns
A Dataset with (k, v) pairs (or concatenated Arrow schema) where k comes from the first dataset and v comes from the second.