Data Representations

Block API

ray.data.block.Block

alias of Union[List[ray.data.block.T], pyarrow.Table, pandas.DataFrame, bytes]

class ray.data.block.BlockExecStats[source]

Execution stats for this block.

wall_time_s

The wall-clock time it took to compute this block.

cpu_time_s

The CPU time it took to compute this block.

node_id

A unique id for the node that computed this block.

DeveloperAPI: This API may change across minor Ray releases.

class ray.data.block.BlockMetadata(num_rows: Optional[int], size_bytes: Optional[int], schema: Optional[Union[type, pyarrow.lib.Schema]], input_files: Optional[List[str]], exec_stats: Optional[ray.data.block.BlockExecStats])[source]

Metadata about the block.

num_rows

The number of rows contained in this block, or None.

Type

Optional[int]

size_bytes

The approximate size in bytes of this block, or None.

Type

Optional[int]

schema

The pyarrow schema or types of the block elements, or None.

Type

Optional[Union[type, pyarrow.lib.Schema]]

input_files

The list of file paths used to generate this block, or the empty list if indeterminate.

Type

Optional[List[str]]

exec_stats

Execution stats for this block.

Type

Optional[ray.data.block.BlockExecStats]

DeveloperAPI: This API may change across minor Ray releases.

class ray.data.block.BlockAccessor(*args, **kwds)[source]

Provides accessor methods for a specific block.

Ideally, we wouldn’t need a separate accessor classes for blocks. However, this is needed if we want to support storing pyarrow.Table directly as a top-level Ray object, without a wrapping class (issue #17186).

There are three types of block accessors: SimpleBlockAccessor, which operates over a plain Python list, ArrowBlockAccessor for pyarrow.Table type blocks, PandasBlockAccessor for pandas.DataFrame type blocks.

DeveloperAPI: This API may change across minor Ray releases.

num_rows() int[source]

Return the number of rows contained in this block.

iter_rows() Iterator[ray.data.block.T][source]

Iterate over the rows of this block.

slice(start: int, end: int, copy: bool) Union[List[ray.data.block.T], pyarrow.Table, pandas.DataFrame, bytes][source]

Return a slice of this block.

Parameters
  • start – The starting index of the slice.

  • end – The ending index of the slice.

  • copy – Whether to perform a data copy for the slice.

Returns

The sliced block result.

take(indices: List[int]) Union[List[ray.data.block.T], pyarrow.Table, pandas.DataFrame, bytes][source]

Return a new block containing the provided row indices.

Parameters

indices – The row indices to return.

Returns

A new block containing the provided row indices.

select(columns: List[Union[None, str, Callable[[ray.data.block.T], Any]]]) Union[List[ray.data.block.T], pyarrow.Table, pandas.DataFrame, bytes][source]

Return a new block containing the provided columns.

random_shuffle(random_seed: Optional[int]) Union[List[ray.data.block.T], pyarrow.Table, pandas.DataFrame, bytes][source]

Randomly shuffle this block.

to_pandas() pandas.DataFrame[source]

Convert this block into a Pandas dataframe.

to_numpy(columns: Optional[Union[str, List[str]]] = None) Union[numpy.ndarray, Dict[str, numpy.ndarray]][source]

Convert this block (or columns of block) into a NumPy ndarray.

Parameters

columns – Name of columns to convert, or None if converting all columns.

to_arrow() pyarrow.Table[source]

Convert this block into an Arrow table.

to_block() Union[List[ray.data.block.T], pyarrow.Table, pandas.DataFrame, bytes][source]

Return the base block that this accessor wraps.

to_default() Union[List[ray.data.block.T], pyarrow.Table, pandas.DataFrame, bytes][source]

Return the default data format for this accessor.

to_batch_format(batch_format: str) Union[List[ray.data.block.T], pyarrow.Table, pandas.DataFrame, bytes, numpy.ndarray, Dict[str, numpy.ndarray]][source]

Convert this block into the provided batch format.

Parameters

batch_format – The batch format to convert this block to.

Returns

This block formatted as the provided batch format.

size_bytes() int[source]

Return the approximate size in bytes of this block.

schema() Union[type, pyarrow.lib.Schema][source]

Return the Python type or pyarrow schema of this block.

get_metadata(input_files: List[str], exec_stats: Optional[ray.data.block.BlockExecStats]) ray.data.block.BlockMetadata[source]

Create a metadata object from this block.

zip(other: Union[List[ray.data.block.T], pyarrow.Table, pandas.DataFrame, bytes]) Union[List[ray.data.block.T], pyarrow.Table, pandas.DataFrame, bytes][source]

Zip this block with another block of the same type and size.

static builder() BlockBuilder[T][source]

Create a builder for this block type.

static batch_to_block(batch: Union[List[ray.data.block.T], pyarrow.Table, pandas.DataFrame, bytes, numpy.ndarray, Dict[str, numpy.ndarray]]) Union[List[ray.data.block.T], pyarrow.Table, pandas.DataFrame, bytes][source]

Create a block from user-facing data formats.

static for_block(block: Union[List[ray.data.block.T], pyarrow.Table, pandas.DataFrame, bytes]) BlockAccessor[T][source]

Create a block accessor for the given block.

sample(n_samples: int, key: Any) Union[List[ray.data.block.T], pyarrow.Table, pandas.DataFrame, bytes][source]

Return a random sample of items from this block.

sort_and_partition(boundaries: List[ray.data.block.T], key: Any, descending: bool) List[Union[List[ray.data.block.T], pyarrow.Table, pandas.DataFrame, bytes]][source]

Return a list of sorted partitions of this block.

combine(key: Union[None, str, Callable[[ray.data.block.T], Any]], agg: AggregateFn) Union[List[ray.data.block.U], pyarrow.Table, pandas.DataFrame, bytes][source]

Combine rows with the same key into an accumulator.

static merge_sorted_blocks(blocks: List[Block[T]], key: Any, descending: bool) Tuple[Union[List[ray.data.block.T], pyarrow.Table, pandas.DataFrame, bytes], ray.data.block.BlockMetadata][source]

Return a sorted block by merging a list of sorted blocks.

static aggregate_combined_blocks(blocks: List[Union[List[ray.data.block.T], pyarrow.Table, pandas.DataFrame, bytes]], key: Union[None, str, Callable[[ray.data.block.T], Any]], agg: AggregateFn) Tuple[Union[List[ray.data.block.U], pyarrow.Table, pandas.DataFrame, bytes], ray.data.block.BlockMetadata][source]

Aggregate partially combined and sorted blocks.

Batch API

ray.data.block.DataBatch

alias of Union[List[ray.data.block.T], pyarrow.Table, pandas.DataFrame, bytes, numpy.ndarray, Dict[str, numpy.ndarray]]

Row API

class ray.data.row.TableRow(row: Any)[source]

A dict-like row of a tabular Dataset.

This implements the dictionary mapping interface, but provides more efficient access with less data copying than converting Arrow Tables or Pandas DataFrames into per-row dicts. This class must be subclassed, with subclasses implementing __getitem__, __iter__, and __len__.

Concrete subclasses include ray.data._internal.arrow_block.ArrowRow and ray.data._internal.pandas_block.PandasRow.

PublicAPI: This API is stable across Ray releases.

as_pydict() dict[source]

Convert to a normal Python dict. This will create a new copy of the row.

Tensor Column Extension API

class ray.data.extensions.tensor_extension.TensorDtype(shape: Optional[Tuple[int, ...]], dtype: numpy.dtype)[source]

Pandas extension type for a column of homogeneous-typed tensors.

This extension supports tensors in which the elements have different shapes. However, each tensor element must be non-ragged, i.e. each tensor element must have a well-defined, non-ragged shape.

See: https://github.com/pandas-dev/pandas/blob/master/pandas/core/dtypes/base.py for up-to-date interface documentation and the subclassing contract. The docstrings of the below properties and methods were copied from the base ExtensionDtype.

Examples

>>> # Create a DataFrame with a list of ndarrays as a column.
>>> import pandas as pd
>>> import numpy as np
>>> import ray
>>> df = pd.DataFrame({
...     "one": [1, 2, 3],
...     "two": list(np.arange(24).reshape((3, 2, 2, 2)))})
>>> # Note the opaque np.object dtype for this column.
>>> df.dtypes 
one     int64
two    object
dtype: object
>>> # Cast column to our TensorDtype extension type.
>>> from ray.data.extensions import TensorDtype
>>> df["two"] = df["two"].astype(TensorDtype(np.int64, (3, 2, 2, 2)))
>>> # Note that the column dtype is now TensorDtype instead of
>>> # np.object.
>>> df.dtypes 
one          int64
two    TensorDtype(shape=(3, 2, 2, 2), dtype=int64)
dtype: object
>>> # Pandas is now aware of this tensor column, and we can do the
>>> # typical DataFrame operations on this column.
>>> col = 2 * (df["two"] + 10)
>>> # The ndarrays underlying the tensor column will be manipulated,
>>> # but the column itself will continue to be a Pandas type.
>>> type(col) 
pandas.core.series.Series
>>> col 
0   [[[ 2  4]
      [ 6  8]]
     [[10 12]
       [14 16]]]
1   [[[18 20]
      [22 24]]
     [[26 28]
      [30 32]]]
2   [[[34 36]
      [38 40]]
     [[42 44]
      [46 48]]]
Name: two, dtype: TensorDtype(shape=(3, 2, 2, 2), dtype=int64)
>>> # Once you do an aggregation on that column that returns a single
>>> # row's value, you get back our TensorArrayElement type.
>>> tensor = col.mean()
>>> type(tensor) 
ray.data.extensions.tensor_extension.TensorArrayElement
>>> tensor 
array([[[18., 20.],
        [22., 24.]],
       [[26., 28.],
        [30., 32.]]])
>>> # This is a light wrapper around a NumPy ndarray, and can easily
>>> # be converted to an ndarray.
>>> type(tensor.to_numpy()) 
numpy.ndarray
>>> # In addition to doing Pandas operations on the tensor column,
>>> # you can now put the DataFrame into a Dataset.
>>> ds = ray.data.from_pandas(df) 
>>> # Internally, this column is represented the corresponding
>>> # Arrow tensor extension type.
>>> ds.schema() 
one: int64
two: extension<arrow.py_extension_type<ArrowTensorType>>
>>> # You can write the dataset to Parquet.
>>> ds.write_parquet("/some/path") 
>>> # And you can read it back.
>>> read_ds = ray.data.read_parquet("/some/path") 
>>> read_ds.schema() 
one: int64
two: extension<arrow.py_extension_type<ArrowTensorType>>
>>> read_df = ray.get(read_ds.to_pandas_refs())[0] 
>>> read_df.dtypes 
one          int64
two    TensorDtype(shape=(3, 2, 2, 2), dtype=int64)
dtype: object
>>> # The tensor extension type is preserved along the
>>> # Pandas --> Arrow --> Parquet --> Arrow --> Pandas
>>> # conversion chain.
>>> read_df.equals(df) 
True

PublicAPI (beta): This API is in beta and may change before becoming stable.

property type

The scalar type for the array, e.g. int It’s expected ExtensionArray[item] returns an instance of ExtensionDtype.type for scalar item, assuming that value is valid (not NA). NA values do not need to be instances of type.

property element_dtype

The dtype of the underlying tensor elements.

property element_shape

The shape of the underlying tensor elements. This will be None if the corresponding TensorArray for this TensorDtype holds variable-shaped tensor elements.

property is_variable_shaped

Whether the corresponding TensorArray for this TensorDtype holds variable-shaped tensor elements.

property name: str

A string identifying the data type. Will be used for display in, e.g. Series.dtype

classmethod construct_from_string(string: str)[source]

Construct this type from a string.

This is useful mainly for data types that accept parameters. For example, a period dtype accepts a frequency parameter that can be set as period[H] (where H means hourly frequency).

By default, in the abstract class, just the name of the type is expected. But subclasses can overwrite this method to accept parameters.

Parameters

string (str) – The name of the type, for example category.

Returns

Instance of the dtype.

Return type

ExtensionDtype

Raises

TypeError – If a class cannot be constructed from this ‘string’.

Examples

For extension dtypes with arguments the following may be an adequate implementation.

>>> import re
>>> @classmethod
... def construct_from_string(cls, string):
...     pattern = re.compile(r"^my_type\[(?P<arg_name>.+)\]$")
...     match = pattern.match(string)
...     if match:
...         return cls(**match.groupdict())
...     else:
...         raise TypeError(
...             f"Cannot construct a '{cls.__name__}' from '{string}'"
...         )
classmethod construct_array_type()[source]

Return the array type associated with this dtype.

Returns

Return type

type

class ray.data.extensions.tensor_extension.TensorArray(values: Union[numpy.ndarray, pandas.core.dtypes.generic.ABCSeries, Sequence[Union[numpy.ndarray, ray.air.util.tensor_extensions.pandas.TensorArrayElement]], ray.air.util.tensor_extensions.pandas.TensorArrayElement, Any])[source]

Pandas ExtensionArray representing a tensor column, i.e. a column consisting of ndarrays as elements.

This extension supports tensors in which the elements have different shapes. However, each tensor element must be non-ragged, i.e. each tensor element must have a well-defined, non-ragged shape.

Examples

>>> # Create a DataFrame with a list of ndarrays as a column.
>>> import pandas as pd
>>> import numpy as np
>>> import ray
>>> from ray.data.extensions import TensorArray
>>> df = pd.DataFrame({
...     "one": [1, 2, 3],
...     "two": TensorArray(np.arange(24).reshape((3, 2, 2, 2)))})
>>> # Note that the column dtype is TensorDtype.
>>> df.dtypes 
one          int64
two    TensorDtype(shape=(3, 2, 2, 2), dtype=int64)
dtype: object
>>> # Pandas is aware of this tensor column, and we can do the
>>> # typical DataFrame operations on this column.
>>> col = 2 * (df["two"] + 10)
>>> # The ndarrays underlying the tensor column will be manipulated,
>>> # but the column itself will continue to be a Pandas type.
>>> type(col) 
pandas.core.series.Series
>>> col 
0   [[[ 2  4]
      [ 6  8]]
     [[10 12]
       [14 16]]]
1   [[[18 20]
      [22 24]]
     [[26 28]
      [30 32]]]
2   [[[34 36]
      [38 40]]
     [[42 44]
      [46 48]]]
Name: two, dtype: TensorDtype(shape=(3, 2, 2, 2), dtype=int64)
>>> # Once you do an aggregation on that column that returns a single
>>> # row's value, you get back our TensorArrayElement type.
>>> tensor = col.mean() 
>>> type(tensor) 
ray.data.extensions.tensor_extension.TensorArrayElement
>>> tensor 
array([[[18., 20.],
        [22., 24.]],
       [[26., 28.],
        [30., 32.]]])
>>> # This is a light wrapper around a NumPy ndarray, and can easily
>>> # be converted to an ndarray.
>>> type(tensor.to_numpy()) 
numpy.ndarray
>>> # In addition to doing Pandas operations on the tensor column,
>>> # you can now put the DataFrame into a Dataset.
>>> ds = ray.data.from_pandas(df) 
>>> # Internally, this column is represented the corresponding
>>> # Arrow tensor extension type.
>>> ds.schema() 
one: int64
two: extension<arrow.py_extension_type<ArrowTensorType>>
>>> # You can write the dataset to Parquet.
>>> ds.write_parquet("/some/path") 
>>> # And you can read it back.
>>> read_ds = ray.data.read_parquet("/some/path") 
>>> read_ds.schema() 
one: int64
two: extension<arrow.py_extension_type<ArrowTensorType>>
>>> read_df = ray.get(read_ds.to_pandas_refs())[0] 
>>> read_df.dtypes 
one          int64
two    TensorDtype(shape=(3, 2, 2, 2), dtype=int64)
dtype: object
>>> # The tensor extension type is preserved along the
>>> # Pandas --> Arrow --> Parquet --> Arrow --> Pandas
>>> # conversion chain.
>>> read_df.equals(df) 
True

PublicAPI (beta): This API is in beta and may change before becoming stable.

property dtype: pandas.core.dtypes.base.ExtensionDtype

An instance of ‘ExtensionDtype’.

property is_variable_shaped

Whether this TensorArray holds variable-shaped tensor elements.

property nbytes: int

The number of bytes needed to store this object in memory.

isna() ray.air.util.tensor_extensions.pandas.TensorArray[source]

A 1-D array indicating if each value is missing.

Returns

na_values – In most cases, this should return a NumPy ndarray. For exceptional cases like SparseArray, where returning an ndarray would be expensive, an ExtensionArray may be returned.

Return type

Union[np.ndarray, ExtensionArray]

Notes

If returning an ExtensionArray, then

  • na_values._is_boolean should be True

  • na_values should implement ExtensionArray._reduce()

  • na_values.any and na_values.all should be implemented

take(indices: Sequence[int], allow_fill: bool = False, fill_value: Optional[Any] = None) ray.air.util.tensor_extensions.pandas.TensorArray[source]

Take elements from an array.

Parameters
  • indices (sequence of int) – Indices to be taken.

  • allow_fill (bool, default False) –

    How to handle negative values in indices.

    • False: negative values in indices indicate positional indices from the right (the default). This is similar to numpy.take().

    • True: negative values in indices indicate missing values. These values are set to fill_value. Any other other negative values raise a ValueError.

  • fill_value (any, optional) –

    Fill value to use for NA-indices when allow_fill is True. This may be None, in which case the default NA value for the type, self.dtype.na_value, is used.

    For many ExtensionArrays, there will be two representations of fill_value: a user-facing “boxed” scalar, and a low-level physical NA value. fill_value should be the user-facing version, and the implementation should handle translating that to the physical version for processing the take if necessary.

Returns

Return type

ExtensionArray

Raises
  • IndexError – When the indices are out of bounds for the array.

  • ValueError – When indices contains negative values other than -1 and allow_fill is True.

See also

numpy.take

Take elements from an array along an axis.

api.extensions.take

Take elements from an array.

Notes

ExtensionArray.take is called by Series.__getitem__, .loc, iloc, when indices is a sequence of values. Additionally, it’s called by Series.reindex(), or any other method that causes realignment, with a fill_value.

Examples

Here’s an example implementation, which relies on casting the extension array to object dtype. This uses the helper method pandas.api.extensions.take().

def take(self, indices, allow_fill=False, fill_value=None):
    from pandas.core.algorithms import take

    # If the ExtensionArray is backed by an ndarray, then
    # just pass that here instead of coercing to object.
    data = self.astype(object)

    if allow_fill and fill_value is None:
        fill_value = self.dtype.na_value

    # fill value should always be translated from the scalar
    # type for the array, to the physical storage type for
    # the data, before passing to take.

    result = take(data, indices, fill_value=fill_value,
                  allow_fill=allow_fill)
    return self._from_sequence(result, dtype=self.dtype)
copy() ray.air.util.tensor_extensions.pandas.TensorArray[source]

Return a copy of the array.

Returns

Return type

ExtensionArray

to_numpy(dtype: Optional[numpy.dtype] = None, copy: bool = False, na_value: Any = NoDefault.no_default)[source]

Convert to a NumPy ndarray.

New in version 1.0.0.

This is similar to numpy.asarray(), but may provide additional control over how the conversion is done.

Parameters
  • dtype (str or numpy.dtype, optional) – The dtype to pass to numpy.asarray().

  • copy (bool, default False) – Whether to ensure that the returned value is a not a view on another array. Note that copy=False does not ensure that to_numpy() is no-copy. Rather, copy=True ensure that a copy is made, even if not strictly necessary.

  • na_value (Any, optional) – The value to use for missing values. The default value depends on dtype and the type of the array.

Returns

Return type

numpy.ndarray

property numpy_dtype

Get the dtype of the tensor. :return: The numpy dtype of the backing ndarray

property numpy_ndim

Get the number of tensor dimensions. :return: integer for the number of dimensions

property numpy_shape

Get the shape of the tensor. :return: A tuple of integers for the numpy shape of the backing ndarray

property numpy_size

Get the size of the tensor. :return: integer for the number of elements in the tensor

astype(dtype, copy=True)[source]

Cast to a NumPy array with ‘dtype’.

Parameters
  • dtype (str or dtype) – Typecode or data-type to which the array is cast.

  • copy (bool, default True) – Whether to copy the data, even if not necessary. If False, a copy is made only if the old dtype does not match the new dtype.

Returns

array – NumPy ndarray with ‘dtype’ for its dtype.

Return type

ndarray

any(axis=None, out=None, keepdims=False)[source]

Test whether any array element along a given axis evaluates to True.

See numpy.any() documentation for more information https://numpy.org/doc/stable/reference/generated/numpy.any.html#numpy.any

Parameters
  • axis – Axis or axes along which a logical OR reduction is performed.

  • out – Alternate output array in which to place the result.

  • keepdims – If this is set to True, the axes which are reduced are left in the result as dimensions with size one.

Returns

single boolean unless axis is not None else TensorArray

all(axis=None, out=None, keepdims=False)[source]

Test whether all array elements along a given axis evaluate to True.

Parameters
  • axis – Axis or axes along which a logical AND reduction is performed.

  • out – Alternate output array in which to place the result.

  • keepdims – If this is set to True, the axes which are reduced are left in the result as dimensions with size one.

Returns

single boolean unless axis is not None else TensorArray

class ray.data.extensions.tensor_extension.ArrowTensorType(shape: Tuple[int, ...], dtype: pyarrow.lib.DataType)[source]

Arrow ExtensionType for an array of fixed-shaped, homogeneous-typed tensors.

This is the Arrow side of TensorDtype.

See Arrow extension type docs: https://arrow.apache.org/docs/python/extending_types.html#defining-extension-types-user-defined-types

PublicAPI (beta): This API is in beta and may change before becoming stable.

property shape

Shape of contained tensors.

to_pandas_dtype()[source]

Convert Arrow extension type to corresponding Pandas dtype.

Returns

An instance of pd.api.extensions.ExtensionDtype.

class ray.data.extensions.tensor_extension.ArrowTensorArray[source]

An array of fixed-shape, homogeneous-typed tensors.

This is the Arrow side of TensorArray.

See Arrow docs for customizing extension arrays: https://arrow.apache.org/docs/python/extending_types.html#custom-extension-array-class

PublicAPI (beta): This API is in beta and may change before becoming stable.

OFFSET_DTYPE

alias of numpy.int32

to_pylist(self)[source]

Convert to a list of native Python objects.

Returns

lst

Return type

list

classmethod from_numpy(arr: Union[numpy.ndarray, Iterable[numpy.ndarray]]) Union[ray.air.util.tensor_extensions.arrow.ArrowTensorArray, ray.air.util.tensor_extensions.arrow.ArrowVariableShapedTensorArray][source]

Convert an ndarray or an iterable of ndarrays to an array of homogeneous-typed tensors. If given fixed-shape tensor elements, this will return an ArrowTensorArray; if given variable-shape tensor elements, this will return an ArrowVariableShapedTensorArray.

Parameters

arr – An ndarray or an iterable of ndarrays.

Returns

  • If fixed-shape tensor elements, an ArrowTensorArray containing len(arr) tensors of fixed shape.

  • If variable-shaped tensor elements, an ArrowVariableShapedTensorArray containing len(arr) tensors of variable shape.

  • If scalar elements, a pyarrow.Array.

to_numpy(zero_copy_only: bool = True)[source]

Convert the entire array of tensors into a single ndarray.

Parameters

zero_copy_only – If True, an exception will be raised if the conversion to a NumPy array would require copying the underlying data (e.g. in presence of nulls, or for non-primitive types). This argument is currently ignored, so zero-copy isn’t enforced even if this argument is true.

Returns

A single ndarray representing the entire array of tensors.

class ray.data.extensions.tensor_extension.ArrowVariableShapedTensorType(dtype: pyarrow.lib.DataType)[source]

Arrow ExtensionType for an array of heterogeneous-shaped, homogeneous-typed tensors.

This is the Arrow side of TensorDtype for tensor elements with different shapes. Note that this extension only supports non-ragged tensor elements; i.e., when considering each tensor element in isolation, they must have a well-defined, non-ragged shape.

See Arrow extension type docs: https://arrow.apache.org/docs/python/extending_types.html#defining-extension-types-user-defined-types

PublicAPI (alpha): This API is in alpha and may change before becoming stable.

to_pandas_dtype()[source]

Convert Arrow extension type to corresponding Pandas dtype.

Returns

An instance of pd.api.extensions.ExtensionDtype.

class ray.data.extensions.tensor_extension.ArrowVariableShapedTensorArray[source]

An array of heterogeneous-shaped, homogeneous-typed tensors.

This is the Arrow side of TensorArray for tensor elements that have differing shapes. Note that this extension only supports non-ragged tensor elements; i.e., when considering each tensor element in isolation, they must have a well-defined shape.

See Arrow docs for customizing extension arrays: https://arrow.apache.org/docs/python/extending_types.html#custom-extension-array-class

PublicAPI (alpha): This API is in alpha and may change before becoming stable.

OFFSET_DTYPE

alias of numpy.int32

to_pylist(self)[source]

Convert to a list of native Python objects.

Returns

lst

Return type

list

classmethod from_numpy(arr: Union[numpy.ndarray, List[numpy.ndarray], Tuple[numpy.ndarray]]) ray.air.util.tensor_extensions.arrow.ArrowVariableShapedTensorArray[source]

Convert an ndarray or an iterable of heterogeneous-shaped ndarrays to an array of heterogeneous-shaped, homogeneous-typed tensors.

Parameters

arr – An ndarray or an iterable of heterogeneous-shaped ndarrays.

Returns

An ArrowVariableShapedTensorArray containing len(arr) tensors of heterogeneous shape.

to_numpy(zero_copy_only: bool = True)[source]

Convert the entire array of tensors into a single ndarray.

Parameters

zero_copy_only – If True, an exception will be raised if the conversion to a NumPy array would require copying the underlying data (e.g. in presence of nulls, or for non-primitive types). This argument is currently ignored, so zero-copy isn’t enforced even if this argument is true.

Returns

A single ndarray representing the entire array of tensors.