Data types#

Class#

class ray.data.datatype.DataType(_physical_dtype: pyarrow.DataType | numpy.dtype | type | None, _logical_dtype: _LogicalDataType = _LogicalDataType.ANY)[source]#

A simplified Ray Data DataType supporting Arrow, NumPy, and Python types.

PublicAPI (alpha): This API is in alpha and may change before becoming stable.

is_arrow_type() bool[source]#

Check if this DataType is backed by a PyArrow DataType.

Returns:

True if the internal type is a PyArrow DataType

Return type:

bool

is_numpy_type() bool[source]#

Check if this DataType is backed by a NumPy dtype.

Returns:

True if the internal type is a NumPy dtype

Return type:

bool

is_python_type() bool[source]#

Check if this DataType is backed by a Python type.

Returns:

True if the internal type is a Python type

Return type:

bool

is_pattern_matching() bool[source]#

Check if this DataType is a pattern-matching type.

Pattern-matching types have _physical_dtype=None and are used to match categories of types (e.g., any list, any struct) rather than concrete types.

Returns:

True if this is a pattern-matching type

Return type:

bool

to_arrow_dtype(values: List[Any] | None = None) pyarrow.DataType[source]#

Convert the DataType to a PyArrow DataType.

Parameters:

values – Optional list of values to infer the Arrow type from. Required if the DataType is a Python type.

Returns:

A PyArrow DataType

Raises:

ValueError – If called on a pattern-matching type (where _physical_dtype is None)

to_numpy_dtype() numpy.dtype[source]#

Convert the DataType to a NumPy dtype.

For PyArrow types, attempts to convert via pandas dtype. For Python types, returns object dtype.

Returns:

A NumPy dtype representation

Return type:

np.dtype

Raises:

ValueError – If called on a pattern-matching type (where _physical_dtype is None)

Examples

>>> import numpy as np
>>> DataType.from_numpy(np.dtype('int64')).to_numpy_dtype()
dtype('int64')
>>> DataType.from_numpy(np.dtype('float32')).to_numpy_dtype()
dtype('float32')
to_python_type() type[source]#

Get the internal type if it’s a Python type.

This method doesn’t perform conversion, it only returns the internal type if it’s already a Python type.

Returns:

The internal Python type

Return type:

type

Raises:

ValueError – If the DataType is not backed by a Python type

Examples

>>> dt = DataType(int)
>>> dt.to_python_type()
<class 'int'>
>>> DataType.int64().to_python_type()  
ValueError: DataType is not backed by a Python type
classmethod from_arrow(arrow_type: pyarrow.DataType) DataType[source]#

Create a DataType from a PyArrow DataType.

Parameters:

arrow_type – A PyArrow DataType to wrap

Returns:

A DataType wrapping the given PyArrow type

Return type:

DataType

Examples

>>> import pyarrow as pa
>>> from ray.data.datatype import DataType
>>> DataType.from_arrow(pa.timestamp('s'))
DataType(arrow:timestamp[s])
>>> DataType.from_arrow(pa.int64())
DataType(arrow:int64)
classmethod from_numpy(numpy_dtype: numpy.dtype | str) DataType[source]#

Create a DataType from a NumPy dtype.

Parameters:

numpy_dtype – A NumPy dtype object or string representation

Returns:

A DataType wrapping the given NumPy dtype

Return type:

DataType

Examples

>>> import numpy as np
>>> from ray.data.datatype import DataType
>>> DataType.from_numpy(np.dtype('int32'))
DataType(numpy:int32)
>>> DataType.from_numpy('float64')
DataType(numpy:float64)
classmethod infer_dtype(value: Any) DataType[source]#

Infer DataType from a Python value, handling numpy, Arrow, and Python types.

Parameters:

value – Any Python value to infer the type from

Returns:

The inferred data type

Return type:

DataType

Examples

>>> import numpy as np
>>> from ray.data.datatype import DataType
>>> DataType.infer_dtype(5)
DataType(arrow:int64)
>>> DataType.infer_dtype("hello")
DataType(arrow:string)
>>> DataType.infer_dtype(np.int32(42))
DataType(numpy:int32)
classmethod list(value_type: DataType | _LogicalDataType = _LogicalDataType.ANY) DataType[source]#

Create a DataType representing a list with the given element type.

Pass DataType.ANY (or omit the argument) to create a pattern that matches any list type.

Parameters:

value_type – The DataType of the list elements, or DataType.ANY to match any list. Defaults to DataType.ANY.

Returns:

A DataType with PyArrow list type or a pattern-matching DataType

Return type:

DataType

Examples

>>> from ray.data.datatype import DataType
>>> DataType.list(DataType.int64())  # Exact match: list<int64>
DataType(arrow:list<item: int64>)
>>> DataType.list(DataType.ANY)  # Pattern: matches any list (explicit)
DataType(logical_dtype:LIST)
>>> DataType.list()  # Same as above (terse)
DataType(logical_dtype:LIST)
classmethod large_list(value_type: DataType | _LogicalDataType = _LogicalDataType.ANY) DataType[source]#

Create a DataType representing a large_list with the given element type.

Pass DataType.ANY (or omit the argument) to create a pattern that matches any large_list type.

Parameters:

value_type – The DataType of the list elements, or DataType.ANY to match any large_list. Defaults to DataType.ANY.

Returns:

A DataType with PyArrow large_list type or a pattern-matching DataType

Return type:

DataType

Examples

>>> DataType.large_list(DataType.int64())  # Exact match
DataType(arrow:large_list<item: int64>)
>>> DataType.large_list(DataType.ANY)  # Pattern: matches any large_list (explicit)
DataType(logical_dtype:LARGE_LIST)
>>> DataType.large_list()  # Same as above (terse)
DataType(logical_dtype:LARGE_LIST)
classmethod fixed_size_list(value_type: DataType, list_size: int) DataType[source]#

Create a DataType representing a fixed-size list.

Parameters:
  • value_type – The DataType of the list elements

  • list_size – The fixed size of the list

Returns:

A DataType with PyArrow fixed_size_list type

Return type:

DataType

Examples

>>> from ray.data.datatype import DataType
>>> DataType.fixed_size_list(DataType.float32(), 3)
DataType(arrow:fixed_size_list<item: float>[3])
classmethod struct(fields: List[Tuple[str, DataType]] | _LogicalDataType = _LogicalDataType.ANY) DataType[source]#

Create a DataType representing a struct with the given fields.

Pass DataType.ANY (or omit the argument) to create a pattern that matches any struct type.

Parameters:

fields – List of (field_name, field_type) tuples, or DataType.ANY to match any struct. Defaults to DataType.ANY.

Returns:

A DataType with PyArrow struct type or a pattern-matching DataType

Return type:

DataType

Examples

>>> from ray.data.datatype import DataType
>>> DataType.struct([("x", DataType.int64()), ("y", DataType.float64())])
DataType(arrow:struct<x: int64, y: double>)
>>> DataType.struct(DataType.ANY)  # Pattern: matches any struct (explicit)
DataType(logical_dtype:STRUCT)
>>> DataType.struct()  # Same as above (terse)
DataType(logical_dtype:STRUCT)
classmethod map(key_type: DataType | _LogicalDataType = _LogicalDataType.ANY, value_type: DataType | _LogicalDataType = _LogicalDataType.ANY) DataType[source]#

Create a DataType representing a map with the given key and value types.

Pass DataType.ANY for either argument (or omit them) to create a pattern that matches any map type.

Parameters:
  • key_type – The DataType of the map keys, or DataType.ANY to match any map. Defaults to DataType.ANY.

  • value_type – The DataType of the map values, or DataType.ANY to match any map. Defaults to DataType.ANY.

Returns:

A DataType with PyArrow map type or a pattern-matching DataType

Return type:

DataType

Examples

>>> from ray.data.datatype import DataType
>>> DataType.map(DataType.string(), DataType.int64())
DataType(arrow:map<string, int64>)
>>> DataType.map(DataType.ANY, DataType.ANY)  # Pattern: matches any map (explicit)
DataType(logical_dtype:MAP)
>>> DataType.map()  # Same as above (terse)
DataType(logical_dtype:MAP)
>>> DataType.map(DataType.string(), DataType.ANY)  # Also pattern (partial spec)
DataType(logical_dtype:MAP)
classmethod tensor(shape: Tuple[int, ...] | _LogicalDataType = _LogicalDataType.ANY, dtype: DataType | _LogicalDataType = _LogicalDataType.ANY) DataType[source]#

Create a DataType representing a fixed-shape tensor.

Pass DataType.ANY for arguments (or omit them) to create a pattern that matches any tensor type.

Parameters:
  • shape – The fixed shape of the tensor, or DataType.ANY to match any tensor. Defaults to DataType.ANY.

  • dtype – The DataType of the tensor elements, or DataType.ANY to match any tensor. Defaults to DataType.ANY.

Returns:

A DataType with Ray’s ArrowTensorType or a pattern-matching DataType

Return type:

DataType

Examples

>>> from ray.data.datatype import DataType
>>> DataType.tensor(shape=(3, 4), dtype=DataType.float32())  
DataType(arrow:ArrowTensorType(...))
>>> DataType.tensor(DataType.ANY, DataType.ANY)  # Pattern: matches any tensor (explicit)
DataType(logical_dtype:TENSOR)
>>> DataType.tensor()  # Same as above (terse)
DataType(logical_dtype:TENSOR)
>>> DataType.tensor(shape=(3, 4), dtype=DataType.ANY)  # Also pattern (partial spec)
DataType(logical_dtype:TENSOR)
classmethod variable_shaped_tensor(dtype: DataType | _LogicalDataType = _LogicalDataType.ANY, ndim: int | None = None) DataType[source]#

Create a DataType representing a variable-shaped tensor.

Pass DataType.ANY (or omit the argument) to create a pattern that matches any variable-shaped tensor.

Parameters:
  • dtype – The DataType of the tensor elements, or DataType.ANY to match any tensor. Defaults to DataType.ANY.

  • ndim – The number of dimensions of the tensor

Returns:

A DataType with Ray’s ArrowVariableShapedTensorType or pattern-matching DataType

Return type:

DataType

Examples

>>> from ray.data.datatype import DataType
>>> DataType.variable_shaped_tensor(dtype=DataType.float32(), ndim=2)  
DataType(arrow:ArrowVariableShapedTensorType(...))
>>> DataType.variable_shaped_tensor(DataType.ANY)  # Pattern: matches any var tensor (explicit)
DataType(logical_dtype:TENSOR)
>>> DataType.variable_shaped_tensor()  # Same as above (terse)
DataType(logical_dtype:TENSOR)
classmethod temporal(temporal_type: str | _LogicalDataType = _LogicalDataType.ANY, unit: str | None = None, tz: str | None = None) DataType[source]#

Create a DataType representing a temporal type.

Pass DataType.ANY (or omit the argument) to create a pattern that matches any temporal type.

Parameters:
  • temporal_type – Type of temporal value - one of: - “timestamp”: Timestamp with optional unit and timezone - “date32”: 32-bit date (days since UNIX epoch) - “date64”: 64-bit date (milliseconds since UNIX epoch) - “time32”: 32-bit time of day (s or ms precision) - “time64”: 64-bit time of day (us or ns precision) - “duration”: Time duration with unit - DataType.ANY: Pattern to match any temporal type (default)

  • unit – Time unit for timestamp/time/duration types: - timestamp: “s”, “ms”, “us”, “ns” (default: “us”) - time32: “s”, “ms” (default: “s”) - time64: “us”, “ns” (default: “us”) - duration: “s”, “ms”, “us”, “ns” (default: “us”)

  • tz – Optional timezone string for timestamp types (e.g., “UTC”, “America/New_York”)

Returns:

A DataType with PyArrow temporal type or a pattern-matching DataType

Return type:

DataType

Examples

>>> from ray.data.datatype import DataType
>>> DataType.temporal("timestamp", unit="s")
DataType(arrow:timestamp[s])
>>> DataType.temporal("timestamp", unit="us", tz="UTC")
DataType(arrow:timestamp[us, tz=UTC])
>>> DataType.temporal("date32")
DataType(arrow:date32[day])
>>> DataType.temporal("time64", unit="ns")
DataType(arrow:time64[ns])
>>> DataType.temporal("duration", unit="ms")
DataType(arrow:duration[ms])
>>> DataType.temporal(DataType.ANY)  # Pattern: matches any temporal (explicit)
DataType(logical_dtype:TEMPORAL)
>>> DataType.temporal()  # Same as above (terse)
DataType(logical_dtype:TEMPORAL)
is_list_type() bool[source]#

Check if this DataType represents a list type

Returns:

True if this is any list variant (list, large_list, fixed_size_list)

Examples

>>> DataType.list(DataType.int64()).is_list_type()
True
>>> DataType.int64().is_list_type()
False
is_tensor_type() bool[source]#

Check if this DataType represents a tensor type.

Returns:

True if this is a tensor type

is_struct_type() bool[source]#

Check if this DataType represents a struct type.

Returns:

True if this is a struct type

Examples

>>> DataType.struct([("x", DataType.int64())]).is_struct_type()
True
>>> DataType.int64().is_struct_type()
False
is_map_type() bool[source]#

Check if this DataType represents a map type.

Returns:

True if this is a map type

Examples

>>> DataType.map(DataType.string(), DataType.int64()).is_map_type()
True
>>> DataType.int64().is_map_type()
False
is_nested_type() bool[source]#

Check if this DataType represents a nested type.

Nested types include: lists, structs, maps, unions

Returns:

True if this is any nested type

Examples

>>> DataType.list(DataType.int64()).is_nested_type()
True
>>> DataType.struct([("x", DataType.int64())]).is_nested_type()
True
>>> DataType.int64().is_nested_type()
False
is_numerical_type() bool[source]#

Check if this DataType represents a numerical type.

Numerical types support arithmetic operations and include: integers, floats, decimals

Returns:

True if this is a numerical type

Examples

>>> DataType.int64().is_numerical_type()
True
>>> DataType.float32().is_numerical_type()
True
>>> DataType.string().is_numerical_type()
False
is_string_type() bool[source]#

Check if this DataType represents a string type.

Includes: string, large_string, string_view

Returns:

True if this is a string type

Examples

>>> DataType.string().is_string_type()
True
>>> DataType.int64().is_string_type()
False
is_binary_type() bool[source]#

Check if this DataType represents a binary type.

Includes: binary, large_binary, binary_view, fixed_size_binary

Returns:

True if this is a binary type

Examples

>>> DataType.binary().is_binary_type()
True
>>> DataType.string().is_binary_type()
False
is_temporal_type() bool[source]#

Check if this DataType represents a temporal type.

Includes: date, time, timestamp, duration, interval

Returns:

True if this is a temporal type

Examples

>>> import pyarrow as pa
>>> DataType.from_arrow(pa.timestamp('s')).is_temporal_type()
True
>>> DataType.int64().is_temporal_type()
False
classmethod binary()#

Create a DataType representing variable-length binary data.

Returns:

A DataType with PyArrow binary type

Return type:

DataType

classmethod bool()#

Create a DataType representing a boolean value.

Returns:

A DataType with PyArrow bool type

Return type:

DataType

classmethod float32()#

Create a DataType representing a 32-bit floating point number.

Returns:

A DataType with PyArrow float32 type

Return type:

DataType

classmethod float64()#

Create a DataType representing a 64-bit floating point number.

Returns:

A DataType with PyArrow float64 type

Return type:

DataType

classmethod int16()#

Create a DataType representing a 16-bit signed integer.

Returns:

A DataType with PyArrow int16 type

Return type:

DataType

classmethod int32()#

Create a DataType representing a 32-bit signed integer.

Returns:

A DataType with PyArrow int32 type

Return type:

DataType

classmethod int64()#

Create a DataType representing a 64-bit signed integer.

Returns:

A DataType with PyArrow int64 type

Return type:

DataType

classmethod int8()#

Create a DataType representing an 8-bit signed integer.

Returns:

A DataType with PyArrow int8 type

Return type:

DataType

classmethod string()#

Create a DataType representing a variable-length string.

Returns:

A DataType with PyArrow string type

Return type:

DataType

classmethod uint16()#

Create a DataType representing a 16-bit unsigned integer.

Returns:

A DataType with PyArrow uint16 type

Return type:

DataType

classmethod uint32()#

Create a DataType representing a 32-bit unsigned integer.

Returns:

A DataType with PyArrow uint32 type

Return type:

DataType

classmethod uint64()#

Create a DataType representing a 64-bit unsigned integer.

Returns:

A DataType with PyArrow uint64 type

Return type:

DataType

classmethod uint8()#

Create a DataType representing an 8-bit unsigned integer.

Returns:

A DataType with PyArrow uint8 type

Return type:

DataType