ray.data.extensions.tensor_extension.TensorArray
ray.data.extensions.tensor_extension.TensorArray#
- class ray.data.extensions.tensor_extension.TensorArray(values: Union[numpy.ndarray, pandas.core.dtypes.generic.ABCSeries, Sequence[Union[numpy.ndarray, ray.air.util.tensor_extensions.pandas.TensorArrayElement]], ray.air.util.tensor_extensions.pandas.TensorArrayElement, Any])[source]#
Pandas
ExtensionArray
representing a tensor column, i.e. a column consisting of ndarrays as elements.This extension supports tensors in which the elements have different shapes. However, each tensor element must be non-ragged, i.e. each tensor element must have a well-defined, non-ragged shape.
Examples
>>> # Create a DataFrame with a list of ndarrays as a column. >>> import pandas as pd >>> import numpy as np >>> import ray >>> from ray.data.extensions import TensorArray >>> df = pd.DataFrame({ ... "one": [1, 2, 3], ... "two": TensorArray(np.arange(24).reshape((3, 2, 2, 2)))}) >>> # Note that the column dtype is TensorDtype. >>> df.dtypes one int64 two TensorDtype(shape=(3, 2, 2, 2), dtype=int64) dtype: object >>> # Pandas is aware of this tensor column, and we can do the >>> # typical DataFrame operations on this column. >>> col = 2 * (df["two"] + 10) >>> # The ndarrays underlying the tensor column will be manipulated, >>> # but the column itself will continue to be a Pandas type. >>> type(col) pandas.core.series.Series >>> col 0 [[[ 2 4] [ 6 8]] [[10 12] [14 16]]] 1 [[[18 20] [22 24]] [[26 28] [30 32]]] 2 [[[34 36] [38 40]] [[42 44] [46 48]]] Name: two, dtype: TensorDtype(shape=(3, 2, 2, 2), dtype=int64) >>> # Once you do an aggregation on that column that returns a single >>> # row's value, you get back our TensorArrayElement type. >>> tensor = col.mean() >>> type(tensor) ray.data.extensions.tensor_extension.TensorArrayElement >>> tensor array([[[18., 20.], [22., 24.]], [[26., 28.], [30., 32.]]]) >>> # This is a light wrapper around a NumPy ndarray, and can easily >>> # be converted to an ndarray. >>> type(tensor.to_numpy()) numpy.ndarray >>> # In addition to doing Pandas operations on the tensor column, >>> # you can now put the DataFrame into a Dataset. >>> ds = ray.data.from_pandas(df) >>> # Internally, this column is represented the corresponding >>> # Arrow tensor extension type. >>> ds.schema() one: int64 two: extension<arrow.py_extension_type<ArrowTensorType>> >>> # You can write the dataset to Parquet. >>> ds.write_parquet("/some/path") >>> # And you can read it back. >>> read_ds = ray.data.read_parquet("/some/path") >>> read_ds.schema() one: int64 two: extension<arrow.py_extension_type<ArrowTensorType>>
>>> read_df = ray.get(read_ds.to_pandas_refs())[0] >>> read_df.dtypes one int64 two TensorDtype(shape=(3, 2, 2, 2), dtype=int64) dtype: object >>> # The tensor extension type is preserved along the >>> # Pandas --> Arrow --> Parquet --> Arrow --> Pandas >>> # conversion chain. >>> read_df.equals(df) True
PublicAPI (beta): This API is in beta and may change before becoming stable.
- __init__(values: Union[numpy.ndarray, pandas.core.dtypes.generic.ABCSeries, Sequence[Union[numpy.ndarray, ray.air.util.tensor_extensions.pandas.TensorArrayElement]], ray.air.util.tensor_extensions.pandas.TensorArrayElement, Any])[source]#
- Parameters
values – A NumPy ndarray or sequence of NumPy ndarrays of equal shape.
Methods
__init__
(values)- param values
A NumPy ndarray or sequence of NumPy ndarrays of equal
all
([axis, out, keepdims])Test whether all array elements along a given axis evaluate to True.
any
([axis, out, keepdims])Test whether any array element along a given axis evaluates to True.
argmax
([skipna])Return the index of maximum value.
argmin
([skipna])Return the index of minimum value.
argsort
([ascending, kind, na_position])Return the indices that would sort this array.
astype
(dtype[, copy])Cast to a NumPy array with 'dtype'.
copy
()Return a copy of the array.
delete
(loc)dropna
()Return ExtensionArray without NA values.
equals
(other)Return if another array is equivalent to this array.
factorize
([na_sentinel])Encode the extension array as an enumerated type.
fillna
([value, method, limit])Fill NA/NaN values using the specified method.
isin
(values)Pointwise comparison for set containment in the given values.
isna
()A 1-D array indicating if each value is missing.
ravel
([order])Return a flattened view on this array.
repeat
(repeats[, axis])Repeat elements of a ExtensionArray.
searchsorted
(value[, side, sorter])Find indices where elements should be inserted to maintain order.
shift
([periods, fill_value])Shift values by desired number.
take
(indices[, allow_fill, fill_value])Take elements from an array.
to_numpy
([dtype, copy, na_value])Convert to a NumPy ndarray.
transpose
(*axes)Return a transposed view on this array.
unique
()Compute the ExtensionArray of unique values.
view
([dtype])Return a view on the array.
Attributes
SUPPORTED_REDUCERS
T
dtype
An instance of 'ExtensionDtype'.
is_variable_shaped
Whether this TensorArray holds variable-shaped tensor elements.
nbytes
The number of bytes needed to store this object in memory.
ndim
Extension Arrays are only allowed to be 1-dimensional.
numpy_dtype
Get the dtype of the tensor.
numpy_ndim
Get the number of tensor dimensions.
numpy_shape
Get the shape of the tensor.
numpy_size
Get the size of the tensor.
shape
Return a tuple of the array dimensions.
size
The number of elements in the array.