ray.data.range(n: int, *, parallelism: int = -1) Dataset[source]#

Creates a Dataset from a range of integers [0..n).

This function allows for easy creation of synthetic datasets for testing or benchmarking Ray Data.


>>> import ray
>>> ds = ray.data.range(10000)
>>> ds
Dataset(num_blocks=..., num_rows=10000, schema={id: int64})
>>> ds.map(lambda row: {"id": row["id"] * 2}).take(4)
[{'id': 0}, {'id': 2}, {'id': 4}, {'id': 6}]
  • n – The upper bound of the range of integers.

  • parallelism – The amount of parallelism to use for the dataset. Defaults to -1, which automatically determines the optimal parallelism for your configuration. You should not need to manually set this value in most cases. For details on how the parallelism is automatically determined and guidance on how to tune it, see Tuning read parallelism. Parallelism is upper bounded by n.


A Dataset producing the integers from the range 0 to n.

See also


Call this method for creating synthetic datasets of tensor data.