Data types#

Class#

class ray.data.datatype.DataType(_physical_dtype: pyarrow.DataType | numpy.dtype | type)[source]#

A simplified Ray Data DataType supporting Arrow, NumPy, and Python types.

PublicAPI (alpha): This API is in alpha and may change before becoming stable.

is_of(category: TypeCategory | str) bool[source]#

Check if this DataType belongs to a specific type category.

Parameters:

category – The category to check against.

Returns:

True if the DataType belongs to the category.

is_arrow_type() bool[source]#

Check if this DataType is backed by a PyArrow DataType.

Returns:

True if the internal type is a PyArrow DataType

Return type:

bool

is_numpy_type() bool[source]#

Check if this DataType is backed by a NumPy dtype.

Returns:

True if the internal type is a NumPy dtype

Return type:

bool

is_python_type() bool[source]#

Check if this DataType is backed by a Python type.

Returns:

True if the internal type is a Python type

Return type:

bool

to_arrow_dtype(values: List[Any] | None = None) pyarrow.DataType[source]#

Convert the DataType to a PyArrow DataType.

Parameters:

values – Optional list of values to infer the Arrow type from. Required if the DataType is a Python type.

Returns:

A PyArrow DataType

to_numpy_dtype() numpy.dtype[source]#

Convert the DataType to a NumPy dtype.

For PyArrow types, attempts to convert via pandas dtype. For Python types, returns object dtype.

Returns:

A NumPy dtype representation

Return type:

np.dtype

Examples

>>> import numpy as np
>>> DataType.from_numpy(np.dtype('int64')).to_numpy_dtype()
dtype('int64')
>>> DataType.from_numpy(np.dtype('float32')).to_numpy_dtype()
dtype('float32')
to_python_type() type[source]#

Get the internal type if it’s a Python type.

This method doesn’t perform conversion, it only returns the internal type if it’s already a Python type.

Returns:

The internal Python type

Return type:

type

Raises:

ValueError – If the DataType is not backed by a Python type

Examples

>>> dt = DataType(int)
>>> dt.to_python_type()
<class 'int'>
>>> DataType.int64().to_python_type()  
ValueError: DataType is not backed by a Python type
classmethod from_arrow(arrow_type: pyarrow.DataType) DataType[source]#

Create a DataType from a PyArrow DataType.

Parameters:

arrow_type – A PyArrow DataType to wrap

Returns:

A DataType wrapping the given PyArrow type

Return type:

DataType

Examples

>>> import pyarrow as pa
>>> from ray.data.datatype import DataType
>>> DataType.from_arrow(pa.timestamp('s'))
DataType(arrow:timestamp[s])
>>> DataType.from_arrow(pa.int64())
DataType(arrow:int64)
classmethod from_numpy(numpy_dtype: numpy.dtype | str) DataType[source]#

Create a DataType from a NumPy dtype.

Parameters:

numpy_dtype – A NumPy dtype object or string representation

Returns:

A DataType wrapping the given NumPy dtype

Return type:

DataType

Examples

>>> import numpy as np
>>> from ray.data.datatype import DataType
>>> DataType.from_numpy(np.dtype('int32'))
DataType(numpy:int32)
>>> DataType.from_numpy('float64')
DataType(numpy:float64)
classmethod infer_dtype(value: Any) DataType[source]#

Infer DataType from a Python value, handling numpy, Arrow, and Python types.

Parameters:

value – Any Python value to infer the type from

Returns:

The inferred data type

Return type:

DataType

Examples

>>> import numpy as np
>>> from ray.data.datatype import DataType
>>> DataType.infer_dtype(5)
DataType(arrow:int64)
>>> DataType.infer_dtype("hello")
DataType(arrow:string)
>>> DataType.infer_dtype(np.int32(42))
DataType(numpy:int32)
classmethod list(value_type: DataType) DataType[source]#

Create a DataType representing a list with the given element type.

Parameters:

value_type – The DataType of the list elements.

Returns:

A DataType with PyArrow list type

Return type:

DataType

Examples

>>> from ray.data.datatype import DataType
>>> DataType.list(DataType.int64())  # Exact match: list<int64>
DataType(arrow:list<item: int64>)
classmethod large_list(value_type: DataType) DataType[source]#

Create a DataType representing a large_list with the given element type.

Parameters:

value_type – The DataType of the list elements.

Returns:

A DataType with PyArrow large_list type

Return type:

DataType

Examples

>>> DataType.large_list(DataType.int64())
DataType(arrow:large_list<item: int64>)
classmethod fixed_size_list(value_type: DataType, list_size: int) DataType[source]#

Create a DataType representing a fixed-size list.

Parameters:
  • value_type – The DataType of the list elements

  • list_size – The fixed size of the list

Returns:

A DataType with PyArrow fixed_size_list type

Return type:

DataType

Examples

>>> from ray.data.datatype import DataType
>>> DataType.fixed_size_list(DataType.float32(), 3)
DataType(arrow:fixed_size_list<item: float>[3])
classmethod struct(fields: List[Tuple[str, DataType]]) DataType[source]#

Create a DataType representing a struct with the given fields.

Parameters:

fields – List of (field_name, field_type) tuples.

Returns:

A DataType with PyArrow struct type

Return type:

DataType

Examples

>>> from ray.data.datatype import DataType
>>> DataType.struct([("x", DataType.int64()), ("y", DataType.float64())])
DataType(arrow:struct<x: int64, y: double>)
classmethod map(key_type: DataType, value_type: DataType) DataType[source]#

Create a DataType representing a map with the given key and value types.

Parameters:
  • key_type – The DataType of the map keys.

  • value_type – The DataType of the map values.

Returns:

A DataType with PyArrow map type

Return type:

DataType

Examples

>>> from ray.data.datatype import DataType
>>> DataType.map(DataType.string(), DataType.int64())
DataType(arrow:map<string, int64>)
classmethod tensor(shape: Tuple[int, ...], dtype: DataType) DataType[source]#

Create a DataType representing a fixed-shape tensor.

Parameters:
  • shape – The fixed shape of the tensor.

  • dtype – The DataType of the tensor elements.

Returns:

A DataType with Ray’s ArrowTensorType

Return type:

DataType

Examples

>>> from ray.data.datatype import DataType
>>> DataType.tensor(shape=(3, 4), dtype=DataType.float32())  
DataType(arrow:ArrowTensorType(...))
classmethod variable_shaped_tensor(dtype: DataType, ndim: int = 2) DataType[source]#

Create a DataType representing a variable-shaped tensor.

Parameters:
  • dtype – The DataType of the tensor elements.

  • ndim – The number of dimensions of the tensor. Defaults to 2.

Returns:

A DataType with Ray’s ArrowVariableShapedTensorType

Return type:

DataType

Examples

>>> from ray.data.datatype import DataType
>>> DataType.variable_shaped_tensor(dtype=DataType.float32(), ndim=2)  
DataType(arrow:ArrowVariableShapedTensorType(...))
classmethod temporal(temporal_type: str, unit: str | None = None, tz: str | None = None) DataType[source]#

Create a DataType representing a temporal type.

Parameters:
  • temporal_type – Type of temporal value - one of: - “timestamp”: Timestamp with optional unit and timezone - “date32”: 32-bit date (days since UNIX epoch) - “date64”: 64-bit date (milliseconds since UNIX epoch) - “time32”: 32-bit time of day (s or ms precision) - “time64”: 64-bit time of day (us or ns precision) - “duration”: Time duration with unit

  • unit – Time unit for timestamp/time/duration types: - timestamp: “s”, “ms”, “us”, “ns” (default: “us”) - time32: “s”, “ms” (default: “s”) - time64: “us”, “ns” (default: “us”) - duration: “s”, “ms”, “us”, “ns” (default: “us”)

  • tz – Optional timezone string for timestamp types (e.g., “UTC”, “America/New_York”)

Returns:

A DataType with PyArrow temporal type

Return type:

DataType

Examples

>>> from ray.data.datatype import DataType
>>> DataType.temporal("timestamp", unit="s")
DataType(arrow:timestamp[s])
>>> DataType.temporal("timestamp", unit="us", tz="UTC")
DataType(arrow:timestamp[us, tz=UTC])
>>> DataType.temporal("date32")
DataType(arrow:date32[day])
>>> DataType.temporal("time64", unit="ns")
DataType(arrow:time64[ns])
>>> DataType.temporal("duration", unit="ms")
DataType(arrow:duration[ms])
is_list_type() bool[source]#

Check if this DataType represents a list type

Returns:

True if this is any list variant (list, large_list, fixed_size_list)

Examples

>>> DataType.list(DataType.int64()).is_list_type()
True
>>> DataType.int64().is_list_type()
False
is_tensor_type() bool[source]#

Check if this DataType represents a tensor type.

Returns:

True if this is a tensor type

is_struct_type() bool[source]#

Check if this DataType represents a struct type.

Returns:

True if this is a struct type

Examples

>>> DataType.struct([("x", DataType.int64())]).is_struct_type()
True
>>> DataType.int64().is_struct_type()
False
is_map_type() bool[source]#

Check if this DataType represents a map type.

Returns:

True if this is a map type

Examples

>>> DataType.map(DataType.string(), DataType.int64()).is_map_type()
True
>>> DataType.int64().is_map_type()
False
is_nested_type() bool[source]#

Check if this DataType represents a nested type.

Nested types include: lists, structs, maps, unions

Returns:

True if this is any nested type

Examples

>>> DataType.list(DataType.int64()).is_nested_type()
True
>>> DataType.struct([("x", DataType.int64())]).is_nested_type()
True
>>> DataType.int64().is_nested_type()
False
is_numerical_type() bool[source]#

Check if this DataType represents a numerical type.

Numerical types support arithmetic operations and include: integers, floats, decimals

Returns:

True if this is a numerical type

Examples

>>> DataType.int64().is_numerical_type()
True
>>> DataType.float32().is_numerical_type()
True
>>> DataType.string().is_numerical_type()
False
is_string_type() bool[source]#

Check if this DataType represents a string type.

Includes: string, large_string, string_view

Returns:

True if this is a string type

Examples

>>> DataType.string().is_string_type()
True
>>> DataType.int64().is_string_type()
False
is_binary_type() bool[source]#

Check if this DataType represents a binary type.

Includes: binary, large_binary, binary_view, fixed_size_binary

Returns:

True if this is a binary type

Examples

>>> DataType.binary().is_binary_type()
True
>>> DataType.string().is_binary_type()
False
is_temporal_type() bool[source]#

Check if this DataType represents a temporal type.

Includes: date, time, timestamp, duration, interval

Returns:

True if this is a temporal type

Examples

>>> import pyarrow as pa
>>> DataType.from_arrow(pa.timestamp('s')).is_temporal_type()
True
>>> DataType.int64().is_temporal_type()
False
classmethod binary()#

Create a DataType representing variable-length binary data.

Returns:

A DataType with PyArrow binary type

Return type:

DataType

classmethod bool()#

Create a DataType representing a boolean value.

Returns:

A DataType with PyArrow bool type

Return type:

DataType

classmethod float32()#

Create a DataType representing a 32-bit floating point number.

Returns:

A DataType with PyArrow float32 type

Return type:

DataType

classmethod float64()#

Create a DataType representing a 64-bit floating point number.

Returns:

A DataType with PyArrow float64 type

Return type:

DataType

classmethod int16()#

Create a DataType representing a 16-bit signed integer.

Returns:

A DataType with PyArrow int16 type

Return type:

DataType

classmethod int32()#

Create a DataType representing a 32-bit signed integer.

Returns:

A DataType with PyArrow int32 type

Return type:

DataType

classmethod int64()#

Create a DataType representing a 64-bit signed integer.

Returns:

A DataType with PyArrow int64 type

Return type:

DataType

classmethod int8()#

Create a DataType representing an 8-bit signed integer.

Returns:

A DataType with PyArrow int8 type

Return type:

DataType

classmethod string()#

Create a DataType representing a variable-length string.

Returns:

A DataType with PyArrow string type

Return type:

DataType

classmethod uint16()#

Create a DataType representing a 16-bit unsigned integer.

Returns:

A DataType with PyArrow uint16 type

Return type:

DataType

classmethod uint32()#

Create a DataType representing a 32-bit unsigned integer.

Returns:

A DataType with PyArrow uint32 type

Return type:

DataType

classmethod uint64()#

Create a DataType representing a 64-bit unsigned integer.

Returns:

A DataType with PyArrow uint64 type

Return type:

DataType

classmethod uint8()#

Create a DataType representing an 8-bit unsigned integer.

Returns:

A DataType with PyArrow uint8 type

Return type:

DataType

Enumeration#

class ray.data.datatype.TypeCategory(value)[source]#

High-level categories of data types.

These categories correspond to groups of concrete data types. Use DataType.is_of(category) to check if a DataType belongs to a category.

PublicAPI (alpha): This API is in alpha and may change before becoming stable.