ray.data.Dataset.train_test_split
ray.data.Dataset.train_test_split#
- Dataset.train_test_split(test_size: Union[int, float], *, shuffle: bool = False, seed: Optional[int] = None) Tuple[ray.data.dataset.Dataset[ray.data.block.T], ray.data.dataset.Dataset[ray.data.block.T]] [source]#
Split the dataset into train and test subsets.
Examples
>>> import ray >>> ds = ray.data.range(8) >>> train, test = ds.train_test_split(test_size=0.25) >>> train.take() [0, 1, 2, 3, 4, 5] >>> test.take() [6, 7]
- Parameters
test_size – If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the test split. If int, represents the absolute number of test samples. The train split will always be the compliment of the test split.
shuffle – Whether or not to globally shuffle the dataset before splitting. Defaults to False. This may be a very expensive operation with large datasets.
seed – Fix the random seed to use for shuffle, otherwise one will be chosen based on system randomness. Ignored if
shuffle=False
.
- Returns
Train and test subsets as two Datasets.