ray.data.Dataset.split_at_indices#
- Dataset.split_at_indices(indices: List[int]) List[MaterializedDataset] [source]#
Materialize and split the dataset at the given indices (like
np.split
).Note
This operation will trigger execution of the lazy transformations performed on this dataset.
Examples
>>> import ray >>> ds = ray.data.range(10) >>> d1, d2, d3 = ds.split_at_indices([2, 5]) >>> d1.take_batch() {'id': array([0, 1])} >>> d2.take_batch() {'id': array([2, 3, 4])} >>> d3.take_batch() {'id': array([5, 6, 7, 8, 9])}
Time complexity: O(num splits)
- Parameters:
indices – List of sorted integers which indicate where the dataset are split. If an index exceeds the length of the dataset, an empty dataset is returned.
- Returns:
The dataset splits.
See also
Dataset.split()
Unlike
split_at_indices()
, which lets you split a dataset into different sizes,Dataset.split()
splits a dataset into approximately equal splits.Dataset.split_proportionately()
This method is equivalent to
Dataset.split_at_indices()
if you compute indices manually.Dataset.streaming_split()
.Unlike
split()
,streaming_split()
doesn’t materialize the dataset in memory.