ray.data.Dataset.to_numpy_refs#

Dataset.to_numpy_refs(*, column: str | None = None) List[ObjectRef[numpy.ndarray]][source]#

Converts this Dataset into a distributed set of NumPy ndarrays or dictionary of NumPy ndarrays.

This is only supported for datasets convertible to NumPy ndarrays. This function induces a copy of the data. For zero-copy access to the underlying data, consider using Dataset.to_arrow() or Dataset.get_internal_block_refs().

Examples

>>> import ray
>>> ds = ray.data.range(10, parallelism=2)
>>> refs = ds.to_numpy_refs()
>>> len(refs)
2

Time complexity: O(dataset size / parallelism)

Parameters:
  • column – The name of the column to convert to numpy. If None, all columns are used. If multiple columns are specified, each returned

  • None. (future represents a dict of ndarrays. Defaults to) –

Returns:

A list of remote NumPy ndarrays created from this dataset.

DeveloperAPI: This API may change across minor Ray releases.