ray.data.Dataset.split_proportionately
ray.data.Dataset.split_proportionately#
- Dataset.split_proportionately(proportions: List[float]) List[ray.data.dataset.Dataset[ray.data.block.T]] [source]#
Split the dataset using proportions.
A common use case for this would be splitting the dataset into train and test sets (equivalent to eg. scikit-learn’s
train_test_split
). See alsoDataset.train_test_split
for a higher level abstraction.The indices to split at will be calculated in such a way so that all splits always contains at least one element. If that is not possible, an exception will be raised.
This is equivalent to caulculating the indices manually and calling
Dataset.split_at_indices
.Note
This operation will trigger execution of the lazy transformations performed on this dataset, and will block until execution completes.
Examples
>>> import ray >>> ds = ray.data.range(10) >>> d1, d2, d3 = ds.split_proportionately([0.2, 0.5]) >>> d1.take() [0, 1] >>> d2.take() [2, 3, 4, 5, 6] >>> d3.take() [7, 8, 9]
Time complexity: O(num splits)
See also:
Dataset.split
,Dataset.split_at_indices
,Dataset.train_test_split
- Parameters
proportions – List of proportions to split the dataset according to. Must sum up to less than 1, and each proportion has to be bigger than 0.
- Returns
The dataset splits.