ray.data.Dataset.split_at_indices#

Dataset.split_at_indices(indices: List[int]) List[ray.data.dataset.MaterializedDataset][source]#

Materialize and split the dataset at the given indices (like np.split).

Note

This operation will trigger execution of the lazy transformations performed on this dataset.

Examples

>>> import ray
>>> ds = ray.data.range(10)
>>> d1, d2, d3 = ds.split_at_indices([2, 5])
>>> d1.take_batch()
{'id': array([0, 1])}
>>> d2.take_batch()
{'id': array([2, 3, 4])}
>>> d3.take_batch()
{'id': array([5, 6, 7, 8, 9])}

Time complexity: O(num splits)

Parameters

indices – List of sorted integers which indicate where the dataset are split. If an index exceeds the length of the dataset, an empty dataset is returned.

Returns

The dataset splits.

See also

Dataset.split()

Unlike split_at_indices(), which lets you split a dataset into different sizes, Dataset.split() splits a dataset into approximately equal splits.

Dataset.split_proportionately()

This method is equivalent to Dataset.split_at_indices() if you compute indices manually.

Dataset.streaming_split().

Unlike split(), streaming_split() doesn’t materialize the dataset in memory.