ray.data.extensions.tensor_extension.TensorArray#

class ray.data.extensions.tensor_extension.TensorArray(values: Union[numpy.ndarray, pandas.core.dtypes.generic.ABCSeries, Sequence[Union[numpy.ndarray, ray.air.util.tensor_extensions.pandas.TensorArrayElement]], ray.air.util.tensor_extensions.pandas.TensorArrayElement, Any])[source]#

Pandas ExtensionArray representing a tensor column, i.e. a column consisting of ndarrays as elements.

This extension supports tensors in which the elements have different shapes. However, each tensor element must be non-ragged, i.e. each tensor element must have a well-defined, non-ragged shape.

Examples

>>> # Create a DataFrame with a list of ndarrays as a column.
>>> import pandas as pd
>>> import numpy as np
>>> import ray
>>> from ray.data.extensions import TensorArray
>>> df = pd.DataFrame({
...     "one": [1, 2, 3],
...     "two": TensorArray(np.arange(24).reshape((3, 2, 2, 2)))})
>>> # Note that the column dtype is TensorDtype.
>>> df.dtypes 
one          int64
two    TensorDtype(shape=(3, 2, 2, 2), dtype=int64)
dtype: object
>>> # Pandas is aware of this tensor column, and we can do the
>>> # typical DataFrame operations on this column.
>>> col = 2 * (df["two"] + 10)
>>> # The ndarrays underlying the tensor column will be manipulated,
>>> # but the column itself will continue to be a Pandas type.
>>> type(col) 
pandas.core.series.Series
>>> col 
0   [[[ 2  4]
      [ 6  8]]
     [[10 12]
       [14 16]]]
1   [[[18 20]
      [22 24]]
     [[26 28]
      [30 32]]]
2   [[[34 36]
      [38 40]]
     [[42 44]
      [46 48]]]
Name: two, dtype: TensorDtype(shape=(3, 2, 2, 2), dtype=int64)
>>> # Once you do an aggregation on that column that returns a single
>>> # row's value, you get back our TensorArrayElement type.
>>> tensor = col.mean() 
>>> type(tensor) 
ray.data.extensions.tensor_extension.TensorArrayElement
>>> tensor 
array([[[18., 20.],
        [22., 24.]],
       [[26., 28.],
        [30., 32.]]])
>>> # This is a light wrapper around a NumPy ndarray, and can easily
>>> # be converted to an ndarray.
>>> type(tensor.to_numpy()) 
numpy.ndarray
>>> # In addition to doing Pandas operations on the tensor column,
>>> # you can now put the DataFrame into a Dataset.
>>> ds = ray.data.from_pandas(df) 
>>> # Internally, this column is represented the corresponding
>>> # Arrow tensor extension type.
>>> ds.schema() 
one: int64
two: extension<arrow.py_extension_type<ArrowTensorType>>
>>> # You can write the dataset to Parquet.
>>> ds.write_parquet("/some/path") 
>>> # And you can read it back.
>>> read_ds = ray.data.read_parquet("/some/path") 
>>> read_ds.schema() 
one: int64
two: extension<arrow.py_extension_type<ArrowTensorType>>
>>> read_df = ray.get(read_ds.to_pandas_refs())[0] 
>>> read_df.dtypes 
one          int64
two    TensorDtype(shape=(3, 2, 2, 2), dtype=int64)
dtype: object
>>> # The tensor extension type is preserved along the
>>> # Pandas --> Arrow --> Parquet --> Arrow --> Pandas
>>> # conversion chain.
>>> read_df.equals(df) 
True

PublicAPI (beta): This API is in beta and may change before becoming stable.

__init__(values: Union[numpy.ndarray, pandas.core.dtypes.generic.ABCSeries, Sequence[Union[numpy.ndarray, ray.air.util.tensor_extensions.pandas.TensorArrayElement]], ray.air.util.tensor_extensions.pandas.TensorArrayElement, Any])[source]#
Parameters

values ‚Äď A NumPy ndarray or sequence of NumPy ndarrays of equal shape.

Methods

__init__(values)

param values

A NumPy ndarray or sequence of NumPy ndarrays of equal

all([axis, out, keepdims])

Test whether all array elements along a given axis evaluate to True.

any([axis, out, keepdims])

Test whether any array element along a given axis evaluates to True.

argmax([skipna])

Return the index of maximum value.

argmin([skipna])

Return the index of minimum value.

argsort([ascending, kind, na_position])

Return the indices that would sort this array.

astype(dtype[, copy])

Cast to a NumPy array with 'dtype'.

copy()

Return a copy of the array.

delete(loc)

dropna()

Return ExtensionArray without NA values.

equals(other)

Return if another array is equivalent to this array.

factorize([na_sentinel])

Encode the extension array as an enumerated type.

fillna([value, method, limit])

Fill NA/NaN values using the specified method.

isin(values)

Pointwise comparison for set containment in the given values.

isna()

A 1-D array indicating if each value is missing.

ravel([order])

Return a flattened view on this array.

repeat(repeats[, axis])

Repeat elements of a ExtensionArray.

searchsorted(value[, side, sorter])

Find indices where elements should be inserted to maintain order.

shift([periods, fill_value])

Shift values by desired number.

take(indices[, allow_fill, fill_value])

Take elements from an array.

to_numpy([dtype, copy, na_value])

Convert to a NumPy ndarray.

transpose(*axes)

Return a transposed view on this array.

unique()

Compute the ExtensionArray of unique values.

view([dtype])

Return a view on the array.

Attributes

SUPPORTED_REDUCERS

T

dtype

An instance of 'ExtensionDtype'.

is_variable_shaped

Whether this TensorArray holds variable-shaped tensor elements.

nbytes

The number of bytes needed to store this object in memory.

ndim

Extension Arrays are only allowed to be 1-dimensional.

numpy_dtype

Get the dtype of the tensor.

numpy_ndim

Get the number of tensor dimensions.

numpy_shape

Get the shape of the tensor.

numpy_size

Get the size of the tensor.

shape

Return a tuple of the array dimensions.

size

The number of elements in the array.