Data types#
Class#
- class ray.data.datatype.DataType(_physical_dtype: pyarrow.DataType | numpy.dtype | type)[source]#
A simplified Ray Data DataType supporting Arrow, NumPy, and Python types.
PublicAPI (alpha): This API is in alpha and may change before becoming stable.
- is_of(category: TypeCategory | str) bool[source]#
Check if this DataType belongs to a specific type category.
- Parameters:
category – The category to check against.
- Returns:
True if the DataType belongs to the category.
- is_arrow_type() bool[source]#
Check if this DataType is backed by a PyArrow DataType.
- Returns:
True if the internal type is a PyArrow DataType
- Return type:
- is_numpy_type() bool[source]#
Check if this DataType is backed by a NumPy dtype.
- Returns:
True if the internal type is a NumPy dtype
- Return type:
- is_python_type() bool[source]#
Check if this DataType is backed by a Python type.
- Returns:
True if the internal type is a Python type
- Return type:
- to_arrow_dtype(values: List[Any] | None = None) pyarrow.DataType[source]#
Convert the DataType to a PyArrow DataType.
- Parameters:
values – Optional list of values to infer the Arrow type from. Required if the DataType is a Python type.
- Returns:
A PyArrow DataType
- to_numpy_dtype() numpy.dtype[source]#
Convert the DataType to a NumPy dtype.
For PyArrow types, attempts to convert via pandas dtype. For Python types, returns object dtype.
- Returns:
A NumPy dtype representation
- Return type:
np.dtype
Examples
>>> import numpy as np >>> DataType.from_numpy(np.dtype('int64')).to_numpy_dtype() dtype('int64') >>> DataType.from_numpy(np.dtype('float32')).to_numpy_dtype() dtype('float32')
- to_python_type() type[source]#
Get the internal type if it’s a Python type.
This method doesn’t perform conversion, it only returns the internal type if it’s already a Python type.
- Returns:
The internal Python type
- Return type:
- Raises:
ValueError – If the DataType is not backed by a Python type
Examples
>>> dt = DataType(int) >>> dt.to_python_type() <class 'int'> >>> DataType.int64().to_python_type() ValueError: DataType is not backed by a Python type
- classmethod from_arrow(arrow_type: pyarrow.DataType) DataType[source]#
Create a DataType from a PyArrow DataType.
- Parameters:
arrow_type – A PyArrow DataType to wrap
- Returns:
A DataType wrapping the given PyArrow type
- Return type:
Examples
>>> import pyarrow as pa >>> from ray.data.datatype import DataType >>> DataType.from_arrow(pa.timestamp('s')) DataType(arrow:timestamp[s]) >>> DataType.from_arrow(pa.int64()) DataType(arrow:int64)
- classmethod from_numpy(numpy_dtype: numpy.dtype | str) DataType[source]#
Create a DataType from a NumPy dtype.
- Parameters:
numpy_dtype – A NumPy dtype object or string representation
- Returns:
A DataType wrapping the given NumPy dtype
- Return type:
Examples
>>> import numpy as np >>> from ray.data.datatype import DataType >>> DataType.from_numpy(np.dtype('int32')) DataType(numpy:int32) >>> DataType.from_numpy('float64') DataType(numpy:float64)
- classmethod infer_dtype(value: Any) DataType[source]#
Infer DataType from a Python value, handling numpy, Arrow, and Python types.
- Parameters:
value – Any Python value to infer the type from
- Returns:
The inferred data type
- Return type:
Examples
>>> import numpy as np >>> from ray.data.datatype import DataType >>> DataType.infer_dtype(5) DataType(arrow:int64) >>> DataType.infer_dtype("hello") DataType(arrow:string) >>> DataType.infer_dtype(np.int32(42)) DataType(numpy:int32)
- classmethod list(value_type: DataType) DataType[source]#
Create a DataType representing a list with the given element type.
- Parameters:
value_type – The DataType of the list elements.
- Returns:
A DataType with PyArrow list type
- Return type:
Examples
>>> from ray.data.datatype import DataType >>> DataType.list(DataType.int64()) # Exact match: list<int64> DataType(arrow:list<item: int64>)
- classmethod large_list(value_type: DataType) DataType[source]#
Create a DataType representing a large_list with the given element type.
- Parameters:
value_type – The DataType of the list elements.
- Returns:
A DataType with PyArrow large_list type
- Return type:
Examples
>>> DataType.large_list(DataType.int64()) DataType(arrow:large_list<item: int64>)
- classmethod fixed_size_list(value_type: DataType, list_size: int) DataType[source]#
Create a DataType representing a fixed-size list.
- Parameters:
value_type – The DataType of the list elements
list_size – The fixed size of the list
- Returns:
A DataType with PyArrow fixed_size_list type
- Return type:
Examples
>>> from ray.data.datatype import DataType >>> DataType.fixed_size_list(DataType.float32(), 3) DataType(arrow:fixed_size_list<item: float>[3])
- classmethod struct(fields: List[Tuple[str, DataType]]) DataType[source]#
Create a DataType representing a struct with the given fields.
- Parameters:
fields – List of (field_name, field_type) tuples.
- Returns:
A DataType with PyArrow struct type
- Return type:
Examples
>>> from ray.data.datatype import DataType >>> DataType.struct([("x", DataType.int64()), ("y", DataType.float64())]) DataType(arrow:struct<x: int64, y: double>)
- classmethod map(key_type: DataType, value_type: DataType) DataType[source]#
Create a DataType representing a map with the given key and value types.
- Parameters:
key_type – The DataType of the map keys.
value_type – The DataType of the map values.
- Returns:
A DataType with PyArrow map type
- Return type:
Examples
>>> from ray.data.datatype import DataType >>> DataType.map(DataType.string(), DataType.int64()) DataType(arrow:map<string, int64>)
- classmethod tensor(shape: Tuple[int, ...], dtype: DataType) DataType[source]#
Create a DataType representing a fixed-shape tensor.
- Parameters:
shape – The fixed shape of the tensor.
dtype – The DataType of the tensor elements.
- Returns:
A DataType with Ray’s ArrowTensorType
- Return type:
Examples
>>> from ray.data.datatype import DataType >>> DataType.tensor(shape=(3, 4), dtype=DataType.float32()) DataType(arrow:ArrowTensorType(...))
- classmethod variable_shaped_tensor(dtype: DataType, ndim: int = 2) DataType[source]#
Create a DataType representing a variable-shaped tensor.
- Parameters:
dtype – The DataType of the tensor elements.
ndim – The number of dimensions of the tensor. Defaults to 2.
- Returns:
A DataType with Ray’s ArrowVariableShapedTensorType
- Return type:
Examples
>>> from ray.data.datatype import DataType >>> DataType.variable_shaped_tensor(dtype=DataType.float32(), ndim=2) DataType(arrow:ArrowVariableShapedTensorType(...))
- classmethod temporal(temporal_type: str, unit: str | None = None, tz: str | None = None) DataType[source]#
Create a DataType representing a temporal type.
- Parameters:
temporal_type – Type of temporal value - one of: - “timestamp”: Timestamp with optional unit and timezone - “date32”: 32-bit date (days since UNIX epoch) - “date64”: 64-bit date (milliseconds since UNIX epoch) - “time32”: 32-bit time of day (s or ms precision) - “time64”: 64-bit time of day (us or ns precision) - “duration”: Time duration with unit
unit – Time unit for timestamp/time/duration types: - timestamp: “s”, “ms”, “us”, “ns” (default: “us”) - time32: “s”, “ms” (default: “s”) - time64: “us”, “ns” (default: “us”) - duration: “s”, “ms”, “us”, “ns” (default: “us”)
tz – Optional timezone string for timestamp types (e.g., “UTC”, “America/New_York”)
- Returns:
A DataType with PyArrow temporal type
- Return type:
Examples
>>> from ray.data.datatype import DataType >>> DataType.temporal("timestamp", unit="s") DataType(arrow:timestamp[s]) >>> DataType.temporal("timestamp", unit="us", tz="UTC") DataType(arrow:timestamp[us, tz=UTC]) >>> DataType.temporal("date32") DataType(arrow:date32[day]) >>> DataType.temporal("time64", unit="ns") DataType(arrow:time64[ns]) >>> DataType.temporal("duration", unit="ms") DataType(arrow:duration[ms])
- is_list_type() bool[source]#
Check if this DataType represents a list type
- Returns:
True if this is any list variant (list, large_list, fixed_size_list)
Examples
>>> DataType.list(DataType.int64()).is_list_type() True >>> DataType.int64().is_list_type() False
- is_tensor_type() bool[source]#
Check if this DataType represents a tensor type.
- Returns:
True if this is a tensor type
- is_struct_type() bool[source]#
Check if this DataType represents a struct type.
- Returns:
True if this is a struct type
Examples
>>> DataType.struct([("x", DataType.int64())]).is_struct_type() True >>> DataType.int64().is_struct_type() False
- is_map_type() bool[source]#
Check if this DataType represents a map type.
- Returns:
True if this is a map type
Examples
>>> DataType.map(DataType.string(), DataType.int64()).is_map_type() True >>> DataType.int64().is_map_type() False
- is_nested_type() bool[source]#
Check if this DataType represents a nested type.
Nested types include: lists, structs, maps, unions
- Returns:
True if this is any nested type
Examples
>>> DataType.list(DataType.int64()).is_nested_type() True >>> DataType.struct([("x", DataType.int64())]).is_nested_type() True >>> DataType.int64().is_nested_type() False
- is_numerical_type() bool[source]#
Check if this DataType represents a numerical type.
Numerical types support arithmetic operations and include: integers, floats, decimals
- Returns:
True if this is a numerical type
Examples
>>> DataType.int64().is_numerical_type() True >>> DataType.float32().is_numerical_type() True >>> DataType.string().is_numerical_type() False
- is_string_type() bool[source]#
Check if this DataType represents a string type.
Includes: string, large_string, string_view
- Returns:
True if this is a string type
Examples
>>> DataType.string().is_string_type() True >>> DataType.int64().is_string_type() False
- is_binary_type() bool[source]#
Check if this DataType represents a binary type.
Includes: binary, large_binary, binary_view, fixed_size_binary
- Returns:
True if this is a binary type
Examples
>>> DataType.binary().is_binary_type() True >>> DataType.string().is_binary_type() False
- is_temporal_type() bool[source]#
Check if this DataType represents a temporal type.
Includes: date, time, timestamp, duration, interval
- Returns:
True if this is a temporal type
Examples
>>> import pyarrow as pa >>> DataType.from_arrow(pa.timestamp('s')).is_temporal_type() True >>> DataType.int64().is_temporal_type() False
- classmethod binary()#
Create a DataType representing variable-length binary data.
- Returns:
A DataType with PyArrow binary type
- Return type:
- classmethod bool()#
Create a DataType representing a boolean value.
- Returns:
A DataType with PyArrow bool type
- Return type:
- classmethod float32()#
Create a DataType representing a 32-bit floating point number.
- Returns:
A DataType with PyArrow float32 type
- Return type:
- classmethod float64()#
Create a DataType representing a 64-bit floating point number.
- Returns:
A DataType with PyArrow float64 type
- Return type:
- classmethod int16()#
Create a DataType representing a 16-bit signed integer.
- Returns:
A DataType with PyArrow int16 type
- Return type:
- classmethod int32()#
Create a DataType representing a 32-bit signed integer.
- Returns:
A DataType with PyArrow int32 type
- Return type:
- classmethod int64()#
Create a DataType representing a 64-bit signed integer.
- Returns:
A DataType with PyArrow int64 type
- Return type:
- classmethod int8()#
Create a DataType representing an 8-bit signed integer.
- Returns:
A DataType with PyArrow int8 type
- Return type:
- classmethod string()#
Create a DataType representing a variable-length string.
- Returns:
A DataType with PyArrow string type
- Return type:
- classmethod uint16()#
Create a DataType representing a 16-bit unsigned integer.
- Returns:
A DataType with PyArrow uint16 type
- Return type:
- classmethod uint32()#
Create a DataType representing a 32-bit unsigned integer.
- Returns:
A DataType with PyArrow uint32 type
- Return type:
- classmethod uint64()#
Create a DataType representing a 64-bit unsigned integer.
- Returns:
A DataType with PyArrow uint64 type
- Return type:
Enumeration#
- class ray.data.datatype.TypeCategory(value)[source]#
High-level categories of data types.
These categories correspond to groups of concrete data types. Use DataType.is_of(category) to check if a DataType belongs to a category.
PublicAPI (alpha): This API is in alpha and may change before becoming stable.