ray.data.Dataset.split_at_indices#

Dataset.split_at_indices(indices: List[int]) List[ray.data.dataset.MaterializedDataset][source]#

Materialize and split the dataset at the given indices (like np.split).

Note

This operation will trigger execution of the lazy transformations performed on this dataset.

Examples

>>> import ray
>>> ds = ray.data.range(10)
>>> d1, d2, d3 = ds.split_at_indices([2, 5])
>>> d1.take_batch()
{'id': array([0, 1])}
>>> d2.take_batch()
{'id': array([2, 3, 4])}
>>> d3.take_batch()
{'id': array([5, 6, 7, 8, 9])}

Time complexity: O(num splits)

See also: Dataset.split_at_indices, Dataset.split_proportionately,

and Dataset.streaming_split.

Parameters

indices – List of sorted integers which indicate where the dataset will be split. If an index exceeds the length of the dataset, an empty dataset will be returned.

Returns

The dataset splits.