ray.data.Dataset.to_arrow_refs#

Dataset.to_arrow_refs() List[ObjectRef[pyarrow.Table]][source]#

Convert this Dataset into a distributed set of PyArrow tables.

One PyArrow table is created for each block in this Dataset.

This method is only supported for datasets convertible to PyArrow tables. This function is zero-copy if the existing data is already in PyArrow format. Otherwise, the data is converted to PyArrow format.

Examples

>>> import ray
>>> ds = ray.data.range(10, parallelism=2)
>>> refs = ds.to_arrow_refs()
>>> len(refs)
2

Note

This operation will trigger execution of the lazy transformations performed on this dataset.

Time complexity: O(1) unless conversion is required.

Returns:

A list of remote PyArrow tables created from this dataset.

DeveloperAPI: This API may change across minor Ray releases.