- Dataset.split_proportionately(proportions: List[float]) List[ray.data.dataset.Dataset[ray.data.block.T]] [source]#
Split the dataset using proportions.
A common use case for this would be splitting the dataset into train and test sets (equivalent to eg. scikit-learn’s
train_test_split). See also
Dataset.train_test_splitfor a higher level abstraction.
The indices to split at will be calculated in such a way so that all splits always contains at least one element. If that is not possible, an exception will be raised.
This is equivalent to caulculating the indices manually and calling
This operation will trigger execution of the lazy transformations performed on this dataset, and will block until execution completes.
>>> import ray >>> ds = ray.data.range(10) >>> d1, d2, d3 = ds.split_proportionately([0.2, 0.5]) >>> d1.take() [0, 1] >>> d2.take() [2, 3, 4, 5, 6] >>> d3.take() [7, 8, 9]
Time complexity: O(num splits)
proportions – List of proportions to split the dataset according to. Must sum up to less than 1, and each proportion has to be bigger than 0.
The dataset splits.