Data Representations
Contents
Data Representations#
Block API#
- ray.data.block.Block#
alias of
Union
[List
[ray.data.block.T
],pyarrow.Table
,pandas.DataFrame
,bytes
]
- class ray.data.block.BlockExecStats[source]#
Execution stats for this block.
- wall_time_s#
The wall-clock time it took to compute this block.
- cpu_time_s#
The CPU time it took to compute this block.
- node_id#
A unique id for the node that computed this block.
DeveloperAPI: This API may change across minor Ray releases.
- class ray.data.block.BlockMetadata(num_rows: Optional[int], size_bytes: Optional[int], schema: Optional[Union[type, pyarrow.lib.Schema]], input_files: Optional[List[str]], exec_stats: Optional[ray.data.block.BlockExecStats])[source]#
Metadata about the block.
- num_rows#
The number of rows contained in this block, or None.
- Type
Optional[int]
- size_bytes#
The approximate size in bytes of this block, or None.
- Type
Optional[int]
- schema#
The pyarrow schema or types of the block elements, or None.
- Type
Optional[Union[type, pyarrow.lib.Schema]]
- input_files#
The list of file paths used to generate this block, or the empty list if indeterminate.
- Type
Optional[List[str]]
- exec_stats#
Execution stats for this block.
- Type
Optional[ray.data.block.BlockExecStats]
DeveloperAPI: This API may change across minor Ray releases.
- class ray.data.block.BlockAccessor(*args, **kwds)[source]#
Provides accessor methods for a specific block.
Ideally, we wouldn’t need a separate accessor classes for blocks. However, this is needed if we want to support storing
pyarrow.Table
directly as a top-level Ray object, without a wrapping class (issue #17186).There are three types of block accessors:
SimpleBlockAccessor
, which operates over a plain Python list,ArrowBlockAccessor
forpyarrow.Table
type blocks,PandasBlockAccessor
forpandas.DataFrame
type blocks.DeveloperAPI: This API may change across minor Ray releases.
- slice(start: int, end: int, copy: bool) Union[List[ray.data.block.T], pyarrow.Table, pandas.DataFrame, bytes] [source]#
Return a slice of this block.
- Parameters
start – The starting index of the slice.
end – The ending index of the slice.
copy – Whether to perform a data copy for the slice.
- Returns
The sliced block result.
- take(indices: List[int]) Union[List[ray.data.block.T], pyarrow.Table, pandas.DataFrame, bytes] [source]#
Return a new block containing the provided row indices.
- Parameters
indices – The row indices to return.
- Returns
A new block containing the provided row indices.
- select(columns: List[Union[None, str, Callable[[ray.data.block.T], Any]]]) Union[List[ray.data.block.T], pyarrow.Table, pandas.DataFrame, bytes] [source]#
Return a new block containing the provided columns.
- random_shuffle(random_seed: Optional[int]) Union[List[ray.data.block.T], pyarrow.Table, pandas.DataFrame, bytes] [source]#
Randomly shuffle this block.
- to_numpy(columns: Optional[Union[str, List[str]]] = None) Union[numpy.ndarray, Dict[str, numpy.ndarray]] [source]#
Convert this block (or columns of block) into a NumPy ndarray.
- Parameters
columns – Name of columns to convert, or None if converting all columns.
- to_block() Union[List[ray.data.block.T], pyarrow.Table, pandas.DataFrame, bytes] [source]#
Return the base block that this accessor wraps.
- to_default() Union[List[ray.data.block.T], pyarrow.Table, pandas.DataFrame, bytes] [source]#
Return the default data format for this accessor.
- to_batch_format(batch_format: str) Union[List[ray.data.block.T], pyarrow.Table, pandas.DataFrame, bytes, numpy.ndarray, Dict[str, numpy.ndarray]] [source]#
Convert this block into the provided batch format.
- Parameters
batch_format – The batch format to convert this block to.
- Returns
This block formatted as the provided batch format.
- schema() Union[type, pyarrow.lib.Schema] [source]#
Return the Python type or pyarrow schema of this block.
- get_metadata(input_files: List[str], exec_stats: Optional[ray.data.block.BlockExecStats]) ray.data.block.BlockMetadata [source]#
Create a metadata object from this block.
- zip(other: Union[List[ray.data.block.T], pyarrow.Table, pandas.DataFrame, bytes]) Union[List[ray.data.block.T], pyarrow.Table, pandas.DataFrame, bytes] [source]#
Zip this block with another block of the same type and size.
- static batch_to_block(batch: Union[List[ray.data.block.T], pyarrow.Table, pandas.DataFrame, bytes, numpy.ndarray, Dict[str, numpy.ndarray]]) Union[List[ray.data.block.T], pyarrow.Table, pandas.DataFrame, bytes] [source]#
Create a block from user-facing data formats.
- static for_block(block: Union[List[ray.data.block.T], pyarrow.Table, pandas.DataFrame, bytes]) BlockAccessor[T] [source]#
Create a block accessor for the given block.
- sample(n_samples: int, key: Any) Union[List[ray.data.block.T], pyarrow.Table, pandas.DataFrame, bytes] [source]#
Return a random sample of items from this block.
- sort_and_partition(boundaries: List[ray.data.block.T], key: Any, descending: bool) List[Union[List[ray.data.block.T], pyarrow.Table, pandas.DataFrame, bytes]] [source]#
Return a list of sorted partitions of this block.
- combine(key: Union[None, str, Callable[[ray.data.block.T], Any]], agg: AggregateFn) Union[List[ray.data.block.U], pyarrow.Table, pandas.DataFrame, bytes] [source]#
Combine rows with the same key into an accumulator.
- static merge_sorted_blocks(blocks: List[Block[T]], key: Any, descending: bool) Tuple[Union[List[ray.data.block.T], pyarrow.Table, pandas.DataFrame, bytes], ray.data.block.BlockMetadata] [source]#
Return a sorted block by merging a list of sorted blocks.
- static aggregate_combined_blocks(blocks: List[Union[List[ray.data.block.T], pyarrow.Table, pandas.DataFrame, bytes]], key: Union[None, str, Callable[[ray.data.block.T], Any]], agg: AggregateFn) Tuple[Union[List[ray.data.block.U], pyarrow.Table, pandas.DataFrame, bytes], ray.data.block.BlockMetadata] [source]#
Aggregate partially combined and sorted blocks.
Batch API#
- ray.data.block.DataBatch#
alias of
Union
[List
[ray.data.block.T
],pyarrow.Table
,pandas.DataFrame
,bytes
,numpy.ndarray
,Dict
[str
,numpy.ndarray
]]
Row API#
- class ray.data.row.TableRow(row: Any)[source]#
A dict-like row of a tabular
Dataset
.This implements the dictionary mapping interface, but provides more efficient access with less data copying than converting Arrow Tables or Pandas DataFrames into per-row dicts. This class must be subclassed, with subclasses implementing
__getitem__
,__iter__
, and__len__
.Concrete subclasses include
ray.data._internal.arrow_block.ArrowRow
andray.data._internal.pandas_block.PandasRow
.PublicAPI: This API is stable across Ray releases.
Tensor Column Extension API#
- class ray.data.extensions.tensor_extension.TensorDtype(shape: Tuple[Optional[int], ...], dtype: numpy.dtype)[source]#
Pandas extension type for a column of homogeneous-typed tensors.
This extension supports tensors in which the elements have different shapes. However, each tensor element must be non-ragged, i.e. each tensor element must have a well-defined, non-ragged shape.
See: https://github.com/pandas-dev/pandas/blob/master/pandas/core/dtypes/base.py for up-to-date interface documentation and the subclassing contract. The docstrings of the below properties and methods were copied from the base ExtensionDtype.
Examples
>>> # Create a DataFrame with a list of ndarrays as a column. >>> import pandas as pd >>> import numpy as np >>> import ray >>> df = pd.DataFrame({ ... "one": [1, 2, 3], ... "two": list(np.arange(24).reshape((3, 2, 2, 2)))}) >>> # Note the opaque np.object dtype for this column. >>> df.dtypes one int64 two object dtype: object >>> # Cast column to our TensorDtype extension type. >>> from ray.data.extensions import TensorDtype >>> df["two"] = df["two"].astype(TensorDtype(np.int64, (3, 2, 2, 2))) >>> # Note that the column dtype is now TensorDtype instead of >>> # np.object. >>> df.dtypes one int64 two TensorDtype(shape=(3, 2, 2, 2), dtype=int64) dtype: object >>> # Pandas is now aware of this tensor column, and we can do the >>> # typical DataFrame operations on this column. >>> col = 2 * (df["two"] + 10) >>> # The ndarrays underlying the tensor column will be manipulated, >>> # but the column itself will continue to be a Pandas type. >>> type(col) pandas.core.series.Series >>> col 0 [[[ 2 4] [ 6 8]] [[10 12] [14 16]]] 1 [[[18 20] [22 24]] [[26 28] [30 32]]] 2 [[[34 36] [38 40]] [[42 44] [46 48]]] Name: two, dtype: TensorDtype(shape=(3, 2, 2, 2), dtype=int64) >>> # Once you do an aggregation on that column that returns a single >>> # row's value, you get back our TensorArrayElement type. >>> tensor = col.mean() >>> type(tensor) ray.data.extensions.tensor_extension.TensorArrayElement >>> tensor array([[[18., 20.], [22., 24.]], [[26., 28.], [30., 32.]]]) >>> # This is a light wrapper around a NumPy ndarray, and can easily >>> # be converted to an ndarray. >>> type(tensor.to_numpy()) numpy.ndarray >>> # In addition to doing Pandas operations on the tensor column, >>> # you can now put the DataFrame into a Dataset. >>> ds = ray.data.from_pandas(df) >>> # Internally, this column is represented the corresponding >>> # Arrow tensor extension type. >>> ds.schema() one: int64 two: extension<arrow.py_extension_type<ArrowTensorType>> >>> # You can write the dataset to Parquet. >>> ds.write_parquet("/some/path") >>> # And you can read it back. >>> read_ds = ray.data.read_parquet("/some/path") >>> read_ds.schema() one: int64 two: extension<arrow.py_extension_type<ArrowTensorType>> >>> read_df = ray.get(read_ds.to_pandas_refs())[0] >>> read_df.dtypes one int64 two TensorDtype(shape=(3, 2, 2, 2), dtype=int64) dtype: object >>> # The tensor extension type is preserved along the >>> # Pandas --> Arrow --> Parquet --> Arrow --> Pandas >>> # conversion chain. >>> read_df.equals(df) True
PublicAPI (beta): This API is in beta and may change before becoming stable.
- property type#
The scalar type for the array, e.g.
int
It’s expectedExtensionArray[item]
returns an instance ofExtensionDtype.type
for scalaritem
, assuming that value is valid (not NA). NA values do not need to be instances oftype
.
- property element_dtype#
The dtype of the underlying tensor elements.
- property element_shape#
The shape of the underlying tensor elements. This will be a tuple of Nones if the corresponding TensorArray for this TensorDtype holds variable-shaped tensor elements.
- property is_variable_shaped#
Whether the corresponding TensorArray for this TensorDtype holds variable-shaped tensor elements.
- property name: str#
A string identifying the data type. Will be used for display in, e.g.
Series.dtype
- classmethod construct_from_string(string: str)[source]#
Construct this type from a string.
This is useful mainly for data types that accept parameters. For example, a period dtype accepts a frequency parameter that can be set as
period[H]
(where H means hourly frequency).By default, in the abstract class, just the name of the type is expected. But subclasses can overwrite this method to accept parameters.
- Parameters
string (str) – The name of the type, for example
category
.- Returns
Instance of the dtype.
- Return type
ExtensionDtype
- Raises
TypeError – If a class cannot be constructed from this ‘string’.
Examples
For extension dtypes with arguments the following may be an adequate implementation.
>>> import re >>> @classmethod ... def construct_from_string(cls, string): ... pattern = re.compile(r"^my_type\[(?P<arg_name>.+)\]$") ... match = pattern.match(string) ... if match: ... return cls(**match.groupdict()) ... else: ... raise TypeError( ... f"Cannot construct a '{cls.__name__}' from '{string}'" ... )
- class ray.data.extensions.tensor_extension.TensorArray(values: Union[numpy.ndarray, pandas.core.dtypes.generic.ABCSeries, Sequence[Union[numpy.ndarray, ray.air.util.tensor_extensions.pandas.TensorArrayElement]], ray.air.util.tensor_extensions.pandas.TensorArrayElement, Any])[source]#
Pandas
ExtensionArray
representing a tensor column, i.e. a column consisting of ndarrays as elements.This extension supports tensors in which the elements have different shapes. However, each tensor element must be non-ragged, i.e. each tensor element must have a well-defined, non-ragged shape.
Examples
>>> # Create a DataFrame with a list of ndarrays as a column. >>> import pandas as pd >>> import numpy as np >>> import ray >>> from ray.data.extensions import TensorArray >>> df = pd.DataFrame({ ... "one": [1, 2, 3], ... "two": TensorArray(np.arange(24).reshape((3, 2, 2, 2)))}) >>> # Note that the column dtype is TensorDtype. >>> df.dtypes one int64 two TensorDtype(shape=(3, 2, 2, 2), dtype=int64) dtype: object >>> # Pandas is aware of this tensor column, and we can do the >>> # typical DataFrame operations on this column. >>> col = 2 * (df["two"] + 10) >>> # The ndarrays underlying the tensor column will be manipulated, >>> # but the column itself will continue to be a Pandas type. >>> type(col) pandas.core.series.Series >>> col 0 [[[ 2 4] [ 6 8]] [[10 12] [14 16]]] 1 [[[18 20] [22 24]] [[26 28] [30 32]]] 2 [[[34 36] [38 40]] [[42 44] [46 48]]] Name: two, dtype: TensorDtype(shape=(3, 2, 2, 2), dtype=int64) >>> # Once you do an aggregation on that column that returns a single >>> # row's value, you get back our TensorArrayElement type. >>> tensor = col.mean() >>> type(tensor) ray.data.extensions.tensor_extension.TensorArrayElement >>> tensor array([[[18., 20.], [22., 24.]], [[26., 28.], [30., 32.]]]) >>> # This is a light wrapper around a NumPy ndarray, and can easily >>> # be converted to an ndarray. >>> type(tensor.to_numpy()) numpy.ndarray >>> # In addition to doing Pandas operations on the tensor column, >>> # you can now put the DataFrame into a Dataset. >>> ds = ray.data.from_pandas(df) >>> # Internally, this column is represented the corresponding >>> # Arrow tensor extension type. >>> ds.schema() one: int64 two: extension<arrow.py_extension_type<ArrowTensorType>> >>> # You can write the dataset to Parquet. >>> ds.write_parquet("/some/path") >>> # And you can read it back. >>> read_ds = ray.data.read_parquet("/some/path") >>> read_ds.schema() one: int64 two: extension<arrow.py_extension_type<ArrowTensorType>>
>>> read_df = ray.get(read_ds.to_pandas_refs())[0] >>> read_df.dtypes one int64 two TensorDtype(shape=(3, 2, 2, 2), dtype=int64) dtype: object >>> # The tensor extension type is preserved along the >>> # Pandas --> Arrow --> Parquet --> Arrow --> Pandas >>> # conversion chain. >>> read_df.equals(df) True
PublicAPI (beta): This API is in beta and may change before becoming stable.
- property dtype: pandas.core.dtypes.base.ExtensionDtype#
An instance of ‘ExtensionDtype’.
- property is_variable_shaped#
Whether this TensorArray holds variable-shaped tensor elements.
- property nbytes: int#
The number of bytes needed to store this object in memory.
- isna() ray.air.util.tensor_extensions.pandas.TensorArray [source]#
A 1-D array indicating if each value is missing.
- Returns
na_values – In most cases, this should return a NumPy ndarray. For exceptional cases like
SparseArray
, where returning an ndarray would be expensive, an ExtensionArray may be returned.- Return type
Union[np.ndarray, ExtensionArray]
Notes
If returning an ExtensionArray, then
na_values._is_boolean
should be Truena_values
should implementExtensionArray._reduce()
na_values.any
andna_values.all
should be implemented
- take(indices: Sequence[int], allow_fill: bool = False, fill_value: Optional[Any] = None) ray.air.util.tensor_extensions.pandas.TensorArray [source]#
Take elements from an array.
- Parameters
indices (sequence of int) – Indices to be taken.
allow_fill (bool, default False) –
How to handle negative values in
indices
.False: negative values in
indices
indicate positional indices from the right (the default). This is similar tonumpy.take()
.True: negative values in
indices
indicate missing values. These values are set tofill_value
. Any other other negative values raise aValueError
.
fill_value (any, optional) –
Fill value to use for NA-indices when
allow_fill
is True. This may beNone
, in which case the default NA value for the type,self.dtype.na_value
, is used.For many ExtensionArrays, there will be two representations of
fill_value
: a user-facing “boxed” scalar, and a low-level physical NA value.fill_value
should be the user-facing version, and the implementation should handle translating that to the physical version for processing the take if necessary.
- Returns
- Return type
ExtensionArray
- Raises
IndexError – When the indices are out of bounds for the array.
ValueError – When
indices
contains negative values other than-1
andallow_fill
is True.
See also
numpy.take
Take elements from an array along an axis.
api.extensions.take
Take elements from an array.
Notes
ExtensionArray.take is called by
Series.__getitem__
,.loc
,iloc
, whenindices
is a sequence of values. Additionally, it’s called bySeries.reindex()
, or any other method that causes realignment, with afill_value
.Examples
Here’s an example implementation, which relies on casting the extension array to object dtype. This uses the helper method
pandas.api.extensions.take()
.def take(self, indices, allow_fill=False, fill_value=None): from pandas.core.algorithms import take # If the ExtensionArray is backed by an ndarray, then # just pass that here instead of coercing to object. data = self.astype(object) if allow_fill and fill_value is None: fill_value = self.dtype.na_value # fill value should always be translated from the scalar # type for the array, to the physical storage type for # the data, before passing to take. result = take(data, indices, fill_value=fill_value, allow_fill=allow_fill) return self._from_sequence(result, dtype=self.dtype)
- copy() ray.air.util.tensor_extensions.pandas.TensorArray [source]#
Return a copy of the array.
- Returns
- Return type
ExtensionArray
- to_numpy(dtype: Optional[numpy.dtype] = None, copy: bool = False, na_value: Any = NoDefault.no_default)[source]#
Convert to a NumPy ndarray.
New in version 1.0.0.
This is similar to
numpy.asarray()
, but may provide additional control over how the conversion is done.- Parameters
dtype (str or numpy.dtype, optional) – The dtype to pass to
numpy.asarray()
.copy (bool, default False) – Whether to ensure that the returned value is a not a view on another array. Note that
copy=False
does not ensure thatto_numpy()
is no-copy. Rather,copy=True
ensure that a copy is made, even if not strictly necessary.na_value (Any, optional) – The value to use for missing values. The default value depends on
dtype
and the type of the array.
- Returns
- Return type
numpy.ndarray
- property numpy_dtype#
Get the dtype of the tensor. :return: The numpy dtype of the backing ndarray
- property numpy_ndim#
Get the number of tensor dimensions. :return: integer for the number of dimensions
- property numpy_shape#
Get the shape of the tensor. :return: A tuple of integers for the numpy shape of the backing ndarray
- property numpy_size#
Get the size of the tensor. :return: integer for the number of elements in the tensor
- astype(dtype, copy=True)[source]#
Cast to a NumPy array with ‘dtype’.
- Parameters
dtype (str or dtype) – Typecode or data-type to which the array is cast.
copy (bool, default True) – Whether to copy the data, even if not necessary. If False, a copy is made only if the old dtype does not match the new dtype.
- Returns
array – NumPy ndarray with ‘dtype’ for its dtype.
- Return type
ndarray
- any(axis=None, out=None, keepdims=False)[source]#
Test whether any array element along a given axis evaluates to True.
See numpy.any() documentation for more information https://numpy.org/doc/stable/reference/generated/numpy.any.html#numpy.any
- Parameters
axis – Axis or axes along which a logical OR reduction is performed.
out – Alternate output array in which to place the result.
keepdims – If this is set to True, the axes which are reduced are left in the result as dimensions with size one.
- Returns
single boolean unless axis is not None else TensorArray
- all(axis=None, out=None, keepdims=False)[source]#
Test whether all array elements along a given axis evaluate to True.
- Parameters
axis – Axis or axes along which a logical AND reduction is performed.
out – Alternate output array in which to place the result.
keepdims – If this is set to True, the axes which are reduced are left in the result as dimensions with size one.
- Returns
single boolean unless axis is not None else TensorArray
- class ray.data.extensions.tensor_extension.ArrowTensorType(shape: Tuple[int, ...], dtype: pyarrow.lib.DataType)[source]#
Arrow ExtensionType for an array of fixed-shaped, homogeneous-typed tensors.
This is the Arrow side of TensorDtype.
See Arrow extension type docs: https://arrow.apache.org/docs/python/extending_types.html#defining-extension-types-user-defined-types
PublicAPI (beta): This API is in beta and may change before becoming stable.
- property shape#
Shape of contained tensors.
- class ray.data.extensions.tensor_extension.ArrowTensorArray[source]#
An array of fixed-shape, homogeneous-typed tensors.
This is the Arrow side of TensorArray.
See Arrow docs for customizing extension arrays: https://arrow.apache.org/docs/python/extending_types.html#custom-extension-array-class
PublicAPI (beta): This API is in beta and may change before becoming stable.
- OFFSET_DTYPE#
alias of
numpy.int32
- classmethod from_numpy(arr: Union[numpy.ndarray, Iterable[numpy.ndarray]]) Union[ray.air.util.tensor_extensions.arrow.ArrowTensorArray, ray.air.util.tensor_extensions.arrow.ArrowVariableShapedTensorArray] [source]#
Convert an ndarray or an iterable of ndarrays to an array of homogeneous-typed tensors. If given fixed-shape tensor elements, this will return an
ArrowTensorArray
; if given variable-shape tensor elements, this will return anArrowVariableShapedTensorArray
.- Parameters
arr – An ndarray or an iterable of ndarrays.
- Returns
If fixed-shape tensor elements, an
ArrowTensorArray
containinglen(arr)
tensors of fixed shape.If variable-shaped tensor elements, an
ArrowVariableShapedTensorArray
containinglen(arr)
tensors of variable shape.If scalar elements, a
pyarrow.Array
.
- to_numpy(zero_copy_only: bool = True)[source]#
Convert the entire array of tensors into a single ndarray.
- Parameters
zero_copy_only – If True, an exception will be raised if the conversion to a NumPy array would require copying the underlying data (e.g. in presence of nulls, or for non-primitive types). This argument is currently ignored, so zero-copy isn’t enforced even if this argument is true.
- Returns
A single ndarray representing the entire array of tensors.
- to_variable_shaped_tensor_array() ray.air.util.tensor_extensions.arrow.ArrowVariableShapedTensorArray [source]#
Convert this tensor array to a variable-shaped tensor array.
This is primarily used when concatenating multiple chunked tensor arrays where at least one chunked array is already variable-shaped and/or the shapes of the chunked arrays differ, in which case the resulting concatenated tensor array will need to be in the variable-shaped representation.
- class ray.data.extensions.tensor_extension.ArrowVariableShapedTensorType(dtype: pyarrow.lib.DataType, ndim: int)[source]#
Arrow ExtensionType for an array of heterogeneous-shaped, homogeneous-typed tensors.
This is the Arrow side of TensorDtype for tensor elements with different shapes. Note that this extension only supports non-ragged tensor elements; i.e., when considering each tensor element in isolation, they must have a well-defined, non-ragged shape.
See Arrow extension type docs: https://arrow.apache.org/docs/python/extending_types.html#defining-extension-types-user-defined-types
PublicAPI (alpha): This API is in alpha and may change before becoming stable.
- to_pandas_dtype()[source]#
Convert Arrow extension type to corresponding Pandas dtype.
- Returns
An instance of pd.api.extensions.ExtensionDtype.
- property ndim: int#
Return the number of dimensions in the tensor elements.
- class ray.data.extensions.tensor_extension.ArrowVariableShapedTensorArray[source]#
An array of heterogeneous-shaped, homogeneous-typed tensors.
This is the Arrow side of TensorArray for tensor elements that have differing shapes. Note that this extension only supports non-ragged tensor elements; i.e., when considering each tensor element in isolation, they must have a well-defined shape. This extension also only supports tensor elements that all have the same number of dimensions.
See Arrow docs for customizing extension arrays: https://arrow.apache.org/docs/python/extending_types.html#custom-extension-array-class
PublicAPI (alpha): This API is in alpha and may change before becoming stable.
- OFFSET_DTYPE#
alias of
numpy.int32
- classmethod from_numpy(arr: Union[numpy.ndarray, List[numpy.ndarray], Tuple[numpy.ndarray]]) ray.air.util.tensor_extensions.arrow.ArrowVariableShapedTensorArray [source]#
Convert an ndarray or an iterable of heterogeneous-shaped ndarrays to an array of heterogeneous-shaped, homogeneous-typed tensors.
- Parameters
arr – An ndarray or an iterable of heterogeneous-shaped ndarrays.
- Returns
An ArrowVariableShapedTensorArray containing len(arr) tensors of heterogeneous shape.
- to_numpy(zero_copy_only: bool = True)[source]#
Convert the entire array of tensors into a single ndarray.
- Parameters
zero_copy_only – If True, an exception will be raised if the conversion to a NumPy array would require copying the underlying data (e.g. in presence of nulls, or for non-primitive types). This argument is currently ignored, so zero-copy isn’t enforced even if this argument is true.
- Returns
A single ndarray representing the entire array of tensors.