ray.data.range_tensor#

ray.data.range_tensor(n: int, *, shape: Tuple = (1,), parallelism: int = -1) Dataset[source]#

Creates a Dataset tensors of the provided shape from range [0…n].

This function allows for easy creation of synthetic tensor datasets for testing or benchmarking Ray Data.

Examples

>>> import ray
>>> ds = ray.data.range_tensor(1000, shape=(2, 2))
>>> ds
Dataset(
   num_blocks=...,
   num_rows=1000,
   schema={data: numpy.ndarray(shape=(2, 2), dtype=int64)}
)
>>> ds.map_batches(lambda row: {"data": row["data"] * 2}).take(2)
[{'data': array([[0, 0],
       [0, 0]])}, {'data': array([[2, 2],
       [2, 2]])}]
Parameters:
  • n – The upper bound of the range of tensor records.

  • shape – The shape of each tensor in the dataset.

  • parallelism – The amount of parallelism to use for the dataset. Defaults to -1, which automatically determines the optimal parallelism for your configuration. You should not need to manually set this value in most cases. For details on how the parallelism is automatically determined and guidance on how to tune it, see Tuning read parallelism. Parallelism is upper bounded by n.

Returns:

A Dataset producing the tensor data from range 0 to n.

See also

range()

Call this method to create synthetic datasets of integer data.