class ray.data.extensions.tensor_extension.TensorArray(values: Union[numpy.ndarray, pandas.core.dtypes.generic.ABCSeries, Sequence[Union[numpy.ndarray, ray.air.util.tensor_extensions.pandas.TensorArrayElement]], ray.air.util.tensor_extensions.pandas.TensorArrayElement, Any])[source]#

Bases: pandas.core.arrays.base.ExtensionArray, ray.air.util.tensor_extensions.pandas._TensorOpsMixin, ray.air.util.tensor_extensions.pandas._TensorScalarCastMixin

Pandas ExtensionArray representing a tensor column, i.e. a column consisting of ndarrays as elements.

This extension supports tensors in which the elements have different shapes. However, each tensor element must be non-ragged, i.e. each tensor element must have a well-defined, non-ragged shape.


>>> # Create a DataFrame with a list of ndarrays as a column.
>>> import pandas as pd
>>> import numpy as np
>>> import ray
>>> from ray.data.extensions import TensorArray
>>> df = pd.DataFrame({
...     "one": [1, 2, 3],
...     "two": TensorArray(np.arange(24).reshape((3, 2, 2, 2)))})
>>> # Note that the column dtype is TensorDtype.
>>> df.dtypes 
one          int64
two    TensorDtype(shape=(3, 2, 2, 2), dtype=int64)
dtype: object
>>> # Pandas is aware of this tensor column, and we can do the
>>> # typical DataFrame operations on this column.
>>> col = 2 * (df["two"] + 10)
>>> # The ndarrays underlying the tensor column will be manipulated,
>>> # but the column itself will continue to be a Pandas type.
>>> type(col) 
>>> col 
0   [[[ 2  4]
      [ 6  8]]
     [[10 12]
       [14 16]]]
1   [[[18 20]
      [22 24]]
     [[26 28]
      [30 32]]]
2   [[[34 36]
      [38 40]]
     [[42 44]
      [46 48]]]
Name: two, dtype: TensorDtype(shape=(3, 2, 2, 2), dtype=int64)
>>> # Once you do an aggregation on that column that returns a single
>>> # row's value, you get back our TensorArrayElement type.
>>> tensor = col.mean() 
>>> type(tensor) 
>>> tensor 
array([[[18., 20.],
        [22., 24.]],
       [[26., 28.],
        [30., 32.]]])
>>> # This is a light wrapper around a NumPy ndarray, and can easily
>>> # be converted to an ndarray.
>>> type(tensor.to_numpy()) 
>>> # In addition to doing Pandas operations on the tensor column,
>>> # you can now put the DataFrame into a Dataset.
>>> ds = ray.data.from_pandas(df) 
>>> # Internally, this column is represented the corresponding
>>> # Arrow tensor extension type.
>>> ds.schema() 
one: int64
two: extension<arrow.py_extension_type<ArrowTensorType>>
>>> # You can write the dataset to Parquet.
>>> ds.write_parquet("/some/path") 
>>> # And you can read it back.
>>> read_ds = ray.data.read_parquet("/some/path") 
>>> read_ds.schema() 
one: int64
two: extension<arrow.py_extension_type<ArrowTensorType>>
>>> read_df = ray.get(read_ds.to_pandas_refs())[0] 
>>> read_df.dtypes 
one          int64
two    TensorDtype(shape=(3, 2, 2, 2), dtype=int64)
dtype: object
>>> # The tensor extension type is preserved along the
>>> # Pandas --> Arrow --> Parquet --> Arrow --> Pandas
>>> # conversion chain.
>>> read_df.equals(df) 

PublicAPI (beta): This API is in beta and may change before becoming stable.

property dtype: pandas.core.dtypes.base.ExtensionDtype#

An instance of ‘ExtensionDtype’.

property is_variable_shaped#

Whether this TensorArray holds variable-shaped tensor elements.

property nbytes: int#

The number of bytes needed to store this object in memory.

isna() ray.air.util.tensor_extensions.pandas.TensorArray[source]#

A 1-D array indicating if each value is missing.


na_values – In most cases, this should return a NumPy ndarray. For exceptional cases like SparseArray, where returning an ndarray would be expensive, an ExtensionArray may be returned.

Return type

Union[np.ndarray, ExtensionArray]


If returning an ExtensionArray, then

  • na_values._is_boolean should be True

  • na_values should implement ExtensionArray._reduce()

  • na_values.any and na_values.all should be implemented

take(indices: Sequence[int], allow_fill: bool = False, fill_value: Optional[Any] = None) ray.air.util.tensor_extensions.pandas.TensorArray[source]#

Take elements from an array.

  • indices (sequence of int) – Indices to be taken.

  • allow_fill (bool, default False) –

    How to handle negative values in indices.

    • False: negative values in indices indicate positional indices from the right (the default). This is similar to numpy.take().

    • True: negative values in indices indicate missing values. These values are set to fill_value. Any other other negative values raise a ValueError.

  • fill_value (any, optional) –

    Fill value to use for NA-indices when allow_fill is True. This may be None, in which case the default NA value for the type, self.dtype.na_value, is used.

    For many ExtensionArrays, there will be two representations of fill_value: a user-facing “boxed” scalar, and a low-level physical NA value. fill_value should be the user-facing version, and the implementation should handle translating that to the physical version for processing the take if necessary.


Return type


  • IndexError – When the indices are out of bounds for the array.

  • ValueError – When indices contains negative values other than -1 and allow_fill is True.

See also


Take elements from an array along an axis.


Take elements from an array.


ExtensionArray.take is called by Series.__getitem__, .loc, iloc, when indices is a sequence of values. Additionally, it’s called by Series.reindex(), or any other method that causes realignment, with a fill_value.


Here’s an example implementation, which relies on casting the extension array to object dtype. This uses the helper method pandas.api.extensions.take().

def take(self, indices, allow_fill=False, fill_value=None):
    from pandas.core.algorithms import take

    # If the ExtensionArray is backed by an ndarray, then
    # just pass that here instead of coercing to object.
    data = self.astype(object)

    if allow_fill and fill_value is None:
        fill_value = self.dtype.na_value

    # fill value should always be translated from the scalar
    # type for the array, to the physical storage type for
    # the data, before passing to take.

    result = take(data, indices, fill_value=fill_value,
    return self._from_sequence(result, dtype=self.dtype)
copy() ray.air.util.tensor_extensions.pandas.TensorArray[source]#

Return a copy of the array.


Return type


to_numpy(dtype: Optional[numpy.dtype] = None, copy: bool = False, na_value: Any = NoDefault.no_default)[source]#

Convert to a NumPy ndarray.

New in version 1.0.0.

This is similar to numpy.asarray(), but may provide additional control over how the conversion is done.

  • dtype (str or numpy.dtype, optional) – The dtype to pass to numpy.asarray().

  • copy (bool, default False) – Whether to ensure that the returned value is a not a view on another array. Note that copy=False does not ensure that to_numpy() is no-copy. Rather, copy=True ensure that a copy is made, even if not strictly necessary.

  • na_value (Any, optional) – The value to use for missing values. The default value depends on dtype and the type of the array.


Return type


property numpy_dtype#

Get the dtype of the tensor. :return: The numpy dtype of the backing ndarray

property numpy_ndim#

Get the number of tensor dimensions. :return: integer for the number of dimensions

property numpy_shape#

Get the shape of the tensor. :return: A tuple of integers for the numpy shape of the backing ndarray

property numpy_size#

Get the size of the tensor. :return: integer for the number of elements in the tensor

astype(dtype, copy=True)[source]#

Cast to a NumPy array with ‘dtype’.

  • dtype (str or dtype) – Typecode or data-type to which the array is cast.

  • copy (bool, default True) – Whether to copy the data, even if not necessary. If False, a copy is made only if the old dtype does not match the new dtype.


array – NumPy ndarray with ‘dtype’ for its dtype.

Return type


any(axis=None, out=None, keepdims=False)[source]#

Test whether any array element along a given axis evaluates to True.

See numpy.any() documentation for more information https://numpy.org/doc/stable/reference/generated/numpy.any.html#numpy.any

  • axis – Axis or axes along which a logical OR reduction is performed.

  • out – Alternate output array in which to place the result.

  • keepdims – If this is set to True, the axes which are reduced are left in the result as dimensions with size one.


single boolean unless axis is not None else TensorArray

all(axis=None, out=None, keepdims=False)[source]#

Test whether all array elements along a given axis evaluate to True.

  • axis – Axis or axes along which a logical AND reduction is performed.

  • out – Alternate output array in which to place the result.

  • keepdims – If this is set to True, the axes which are reduced are left in the result as dimensions with size one.


single boolean unless axis is not None else TensorArray