ray.data.Dataset.random_sample#

Dataset.random_sample(fraction: float, *, seed: int | None = None) → Dataset[source]#

Returns a new Dataset containing a random fraction of the rows.

Note

This method returns roughly fraction * total_rows rows. An exact number of rows isn’t guaranteed.

Examples

>>> import ray
>>> ds1 = ray.data.range(100)
>>> ds1.random_sample(0.1).count()  
10
>>> ds2 = ray.data.range(1000)
>>> ds2.random_sample(0.123, seed=42).take(2)  
[{'id': 2}, {'id': 9}]
>>> ds2.random_sample(0.123, seed=42).take(2)  
[{'id': 2}, {'id': 9}]

Parameters:

fraction – The fraction of elements to sample.
seed – Seeds the python random pRNG generator.

Returns:

Returns a Dataset containing the sampled rows.