Dataset.split_proportionately(proportions: List[float]) List[ray.data.dataset.MaterializedDataset][source]#

Materialize and split the dataset using proportions.

A common use case for this would be splitting the dataset into train and test sets (equivalent to eg. scikit-learn’s train_test_split). See also Dataset.train_test_split for a higher level abstraction.

The indices to split at will be calculated in such a way so that all splits always contains at least one element. If that is not possible, an exception will be raised.

This is equivalent to caulculating the indices manually and calling Dataset.split_at_indices.


This operation will trigger execution of the lazy transformations performed on this dataset.


>>> import ray
>>> ds = ray.data.range(10)
>>> d1, d2, d3 = ds.split_proportionately([0.2, 0.5])
>>> d1.take_batch()
{'id': array([0, 1])}
>>> d2.take_batch()
{'id': array([2, 3, 4, 5, 6])}
>>> d3.take_batch()
{'id': array([7, 8, 9])}

Time complexity: O(num splits)

See also: Dataset.split, Dataset.split_at_indices, Dataset.train_test_split


proportions – List of proportions to split the dataset according to. Must sum up to less than 1, and each proportion has to be bigger than 0.


The dataset splits.