ray.data.Dataset.split_at_indices#

Dataset.split_at_indices(indices: List[int]) List[MaterializedDataset][source]#

Materialize and split the dataset at the given indices (like np.split).

Note

This operation will trigger execution of the lazy transformations performed on this dataset.

Examples

>>> import ray
>>> ds = ray.data.range(10)
>>> d1, d2, d3 = ds.split_at_indices([2, 5])
>>> d1.take_batch()
{'id': array([0, 1])}
>>> d2.take_batch()
{'id': array([2, 3, 4])}
>>> d3.take_batch()
{'id': array([5, 6, 7, 8, 9])}

Time complexity: O(num splits)

Parameters:

indices – List of sorted integers which indicate where the dataset are split. If an index exceeds the length of the dataset, an empty dataset is returned.

Returns:

The dataset splits.

See also

Dataset.split()

Unlike split_at_indices(), which lets you split a dataset into different sizes, Dataset.split() splits a dataset into approximately equal splits.

Dataset.split_proportionately()

This method is equivalent to Dataset.split_at_indices() if you compute indices manually.

Dataset.streaming_split().

Unlike split(), streaming_split() doesn’t materialize the dataset in memory.