Data types#
Class#
- class ray.data.datatype.DataType(_physical_dtype: pyarrow.DataType | numpy.dtype | type | None, _logical_dtype: _LogicalDataType = _LogicalDataType.ANY)[source]#
A simplified Ray Data DataType supporting Arrow, NumPy, and Python types.
PublicAPI (alpha): This API is in alpha and may change before becoming stable.
- is_arrow_type() bool[source]#
Check if this DataType is backed by a PyArrow DataType.
- Returns:
True if the internal type is a PyArrow DataType
- Return type:
- is_numpy_type() bool[source]#
Check if this DataType is backed by a NumPy dtype.
- Returns:
True if the internal type is a NumPy dtype
- Return type:
- is_python_type() bool[source]#
Check if this DataType is backed by a Python type.
- Returns:
True if the internal type is a Python type
- Return type:
- is_pattern_matching() bool[source]#
Check if this DataType is a pattern-matching type.
Pattern-matching types have _physical_dtype=None and are used to match categories of types (e.g., any list, any struct) rather than concrete types.
- Returns:
True if this is a pattern-matching type
- Return type:
- to_arrow_dtype(values: List[Any] | None = None) pyarrow.DataType[source]#
Convert the DataType to a PyArrow DataType.
- Parameters:
values – Optional list of values to infer the Arrow type from. Required if the DataType is a Python type.
- Returns:
A PyArrow DataType
- Raises:
ValueError – If called on a pattern-matching type (where _physical_dtype is None)
- to_numpy_dtype() numpy.dtype[source]#
Convert the DataType to a NumPy dtype.
For PyArrow types, attempts to convert via pandas dtype. For Python types, returns object dtype.
- Returns:
A NumPy dtype representation
- Return type:
np.dtype
- Raises:
ValueError – If called on a pattern-matching type (where _physical_dtype is None)
Examples
>>> import numpy as np >>> DataType.from_numpy(np.dtype('int64')).to_numpy_dtype() dtype('int64') >>> DataType.from_numpy(np.dtype('float32')).to_numpy_dtype() dtype('float32')
- to_python_type() type[source]#
Get the internal type if it’s a Python type.
This method doesn’t perform conversion, it only returns the internal type if it’s already a Python type.
- Returns:
The internal Python type
- Return type:
- Raises:
ValueError – If the DataType is not backed by a Python type
Examples
>>> dt = DataType(int) >>> dt.to_python_type() <class 'int'> >>> DataType.int64().to_python_type() ValueError: DataType is not backed by a Python type
- classmethod from_arrow(arrow_type: pyarrow.DataType) DataType[source]#
Create a DataType from a PyArrow DataType.
- Parameters:
arrow_type – A PyArrow DataType to wrap
- Returns:
A DataType wrapping the given PyArrow type
- Return type:
Examples
>>> import pyarrow as pa >>> from ray.data.datatype import DataType >>> DataType.from_arrow(pa.timestamp('s')) DataType(arrow:timestamp[s]) >>> DataType.from_arrow(pa.int64()) DataType(arrow:int64)
- classmethod from_numpy(numpy_dtype: numpy.dtype | str) DataType[source]#
Create a DataType from a NumPy dtype.
- Parameters:
numpy_dtype – A NumPy dtype object or string representation
- Returns:
A DataType wrapping the given NumPy dtype
- Return type:
Examples
>>> import numpy as np >>> from ray.data.datatype import DataType >>> DataType.from_numpy(np.dtype('int32')) DataType(numpy:int32) >>> DataType.from_numpy('float64') DataType(numpy:float64)
- classmethod infer_dtype(value: Any) DataType[source]#
Infer DataType from a Python value, handling numpy, Arrow, and Python types.
- Parameters:
value – Any Python value to infer the type from
- Returns:
The inferred data type
- Return type:
Examples
>>> import numpy as np >>> from ray.data.datatype import DataType >>> DataType.infer_dtype(5) DataType(arrow:int64) >>> DataType.infer_dtype("hello") DataType(arrow:string) >>> DataType.infer_dtype(np.int32(42)) DataType(numpy:int32)
- classmethod list(value_type: DataType | _LogicalDataType = _LogicalDataType.ANY) DataType[source]#
Create a DataType representing a list with the given element type.
Pass DataType.ANY (or omit the argument) to create a pattern that matches any list type.
- Parameters:
value_type – The DataType of the list elements, or DataType.ANY to match any list. Defaults to DataType.ANY.
- Returns:
A DataType with PyArrow list type or a pattern-matching DataType
- Return type:
Examples
>>> from ray.data.datatype import DataType >>> DataType.list(DataType.int64()) # Exact match: list<int64> DataType(arrow:list<item: int64>) >>> DataType.list(DataType.ANY) # Pattern: matches any list (explicit) DataType(logical_dtype:LIST) >>> DataType.list() # Same as above (terse) DataType(logical_dtype:LIST)
- classmethod large_list(value_type: DataType | _LogicalDataType = _LogicalDataType.ANY) DataType[source]#
Create a DataType representing a large_list with the given element type.
Pass DataType.ANY (or omit the argument) to create a pattern that matches any large_list type.
- Parameters:
value_type – The DataType of the list elements, or DataType.ANY to match any large_list. Defaults to DataType.ANY.
- Returns:
A DataType with PyArrow large_list type or a pattern-matching DataType
- Return type:
Examples
>>> DataType.large_list(DataType.int64()) # Exact match DataType(arrow:large_list<item: int64>) >>> DataType.large_list(DataType.ANY) # Pattern: matches any large_list (explicit) DataType(logical_dtype:LARGE_LIST) >>> DataType.large_list() # Same as above (terse) DataType(logical_dtype:LARGE_LIST)
- classmethod fixed_size_list(value_type: DataType, list_size: int) DataType[source]#
Create a DataType representing a fixed-size list.
- Parameters:
value_type – The DataType of the list elements
list_size – The fixed size of the list
- Returns:
A DataType with PyArrow fixed_size_list type
- Return type:
Examples
>>> from ray.data.datatype import DataType >>> DataType.fixed_size_list(DataType.float32(), 3) DataType(arrow:fixed_size_list<item: float>[3])
- classmethod struct(fields: List[Tuple[str, DataType]] | _LogicalDataType = _LogicalDataType.ANY) DataType[source]#
Create a DataType representing a struct with the given fields.
Pass DataType.ANY (or omit the argument) to create a pattern that matches any struct type.
- Parameters:
fields – List of (field_name, field_type) tuples, or DataType.ANY to match any struct. Defaults to DataType.ANY.
- Returns:
A DataType with PyArrow struct type or a pattern-matching DataType
- Return type:
Examples
>>> from ray.data.datatype import DataType >>> DataType.struct([("x", DataType.int64()), ("y", DataType.float64())]) DataType(arrow:struct<x: int64, y: double>) >>> DataType.struct(DataType.ANY) # Pattern: matches any struct (explicit) DataType(logical_dtype:STRUCT) >>> DataType.struct() # Same as above (terse) DataType(logical_dtype:STRUCT)
- classmethod map(key_type: DataType | _LogicalDataType = _LogicalDataType.ANY, value_type: DataType | _LogicalDataType = _LogicalDataType.ANY) DataType[source]#
Create a DataType representing a map with the given key and value types.
Pass DataType.ANY for either argument (or omit them) to create a pattern that matches any map type.
- Parameters:
key_type – The DataType of the map keys, or DataType.ANY to match any map. Defaults to DataType.ANY.
value_type – The DataType of the map values, or DataType.ANY to match any map. Defaults to DataType.ANY.
- Returns:
A DataType with PyArrow map type or a pattern-matching DataType
- Return type:
Examples
>>> from ray.data.datatype import DataType >>> DataType.map(DataType.string(), DataType.int64()) DataType(arrow:map<string, int64>) >>> DataType.map(DataType.ANY, DataType.ANY) # Pattern: matches any map (explicit) DataType(logical_dtype:MAP) >>> DataType.map() # Same as above (terse) DataType(logical_dtype:MAP) >>> DataType.map(DataType.string(), DataType.ANY) # Also pattern (partial spec) DataType(logical_dtype:MAP)
- classmethod tensor(shape: Tuple[int, ...] | _LogicalDataType = _LogicalDataType.ANY, dtype: DataType | _LogicalDataType = _LogicalDataType.ANY) DataType[source]#
Create a DataType representing a fixed-shape tensor.
Pass DataType.ANY for arguments (or omit them) to create a pattern that matches any tensor type.
- Parameters:
shape – The fixed shape of the tensor, or DataType.ANY to match any tensor. Defaults to DataType.ANY.
dtype – The DataType of the tensor elements, or DataType.ANY to match any tensor. Defaults to DataType.ANY.
- Returns:
A DataType with Ray’s ArrowTensorType or a pattern-matching DataType
- Return type:
Examples
>>> from ray.data.datatype import DataType >>> DataType.tensor(shape=(3, 4), dtype=DataType.float32()) DataType(arrow:ArrowTensorType(...)) >>> DataType.tensor(DataType.ANY, DataType.ANY) # Pattern: matches any tensor (explicit) DataType(logical_dtype:TENSOR) >>> DataType.tensor() # Same as above (terse) DataType(logical_dtype:TENSOR) >>> DataType.tensor(shape=(3, 4), dtype=DataType.ANY) # Also pattern (partial spec) DataType(logical_dtype:TENSOR)
- classmethod variable_shaped_tensor(dtype: DataType | _LogicalDataType = _LogicalDataType.ANY, ndim: int | None = None) DataType[source]#
Create a DataType representing a variable-shaped tensor.
Pass DataType.ANY (or omit the argument) to create a pattern that matches any variable-shaped tensor.
- Parameters:
dtype – The DataType of the tensor elements, or DataType.ANY to match any tensor. Defaults to DataType.ANY.
ndim – The number of dimensions of the tensor
- Returns:
A DataType with Ray’s ArrowVariableShapedTensorType or pattern-matching DataType
- Return type:
Examples
>>> from ray.data.datatype import DataType >>> DataType.variable_shaped_tensor(dtype=DataType.float32(), ndim=2) DataType(arrow:ArrowVariableShapedTensorType(...)) >>> DataType.variable_shaped_tensor(DataType.ANY) # Pattern: matches any var tensor (explicit) DataType(logical_dtype:TENSOR) >>> DataType.variable_shaped_tensor() # Same as above (terse) DataType(logical_dtype:TENSOR)
- classmethod temporal(temporal_type: str | _LogicalDataType = _LogicalDataType.ANY, unit: str | None = None, tz: str | None = None) DataType[source]#
Create a DataType representing a temporal type.
Pass DataType.ANY (or omit the argument) to create a pattern that matches any temporal type.
- Parameters:
temporal_type – Type of temporal value - one of: - “timestamp”: Timestamp with optional unit and timezone - “date32”: 32-bit date (days since UNIX epoch) - “date64”: 64-bit date (milliseconds since UNIX epoch) - “time32”: 32-bit time of day (s or ms precision) - “time64”: 64-bit time of day (us or ns precision) - “duration”: Time duration with unit - DataType.ANY: Pattern to match any temporal type (default)
unit – Time unit for timestamp/time/duration types: - timestamp: “s”, “ms”, “us”, “ns” (default: “us”) - time32: “s”, “ms” (default: “s”) - time64: “us”, “ns” (default: “us”) - duration: “s”, “ms”, “us”, “ns” (default: “us”)
tz – Optional timezone string for timestamp types (e.g., “UTC”, “America/New_York”)
- Returns:
A DataType with PyArrow temporal type or a pattern-matching DataType
- Return type:
Examples
>>> from ray.data.datatype import DataType >>> DataType.temporal("timestamp", unit="s") DataType(arrow:timestamp[s]) >>> DataType.temporal("timestamp", unit="us", tz="UTC") DataType(arrow:timestamp[us, tz=UTC]) >>> DataType.temporal("date32") DataType(arrow:date32[day]) >>> DataType.temporal("time64", unit="ns") DataType(arrow:time64[ns]) >>> DataType.temporal("duration", unit="ms") DataType(arrow:duration[ms]) >>> DataType.temporal(DataType.ANY) # Pattern: matches any temporal (explicit) DataType(logical_dtype:TEMPORAL) >>> DataType.temporal() # Same as above (terse) DataType(logical_dtype:TEMPORAL)
- is_list_type() bool[source]#
Check if this DataType represents a list type
- Returns:
True if this is any list variant (list, large_list, fixed_size_list)
Examples
>>> DataType.list(DataType.int64()).is_list_type() True >>> DataType.int64().is_list_type() False
- is_tensor_type() bool[source]#
Check if this DataType represents a tensor type.
- Returns:
True if this is a tensor type
- is_struct_type() bool[source]#
Check if this DataType represents a struct type.
- Returns:
True if this is a struct type
Examples
>>> DataType.struct([("x", DataType.int64())]).is_struct_type() True >>> DataType.int64().is_struct_type() False
- is_map_type() bool[source]#
Check if this DataType represents a map type.
- Returns:
True if this is a map type
Examples
>>> DataType.map(DataType.string(), DataType.int64()).is_map_type() True >>> DataType.int64().is_map_type() False
- is_nested_type() bool[source]#
Check if this DataType represents a nested type.
Nested types include: lists, structs, maps, unions
- Returns:
True if this is any nested type
Examples
>>> DataType.list(DataType.int64()).is_nested_type() True >>> DataType.struct([("x", DataType.int64())]).is_nested_type() True >>> DataType.int64().is_nested_type() False
- is_numerical_type() bool[source]#
Check if this DataType represents a numerical type.
Numerical types support arithmetic operations and include: integers, floats, decimals
- Returns:
True if this is a numerical type
Examples
>>> DataType.int64().is_numerical_type() True >>> DataType.float32().is_numerical_type() True >>> DataType.string().is_numerical_type() False
- is_string_type() bool[source]#
Check if this DataType represents a string type.
Includes: string, large_string, string_view
- Returns:
True if this is a string type
Examples
>>> DataType.string().is_string_type() True >>> DataType.int64().is_string_type() False
- is_binary_type() bool[source]#
Check if this DataType represents a binary type.
Includes: binary, large_binary, binary_view, fixed_size_binary
- Returns:
True if this is a binary type
Examples
>>> DataType.binary().is_binary_type() True >>> DataType.string().is_binary_type() False
- is_temporal_type() bool[source]#
Check if this DataType represents a temporal type.
Includes: date, time, timestamp, duration, interval
- Returns:
True if this is a temporal type
Examples
>>> import pyarrow as pa >>> DataType.from_arrow(pa.timestamp('s')).is_temporal_type() True >>> DataType.int64().is_temporal_type() False
- classmethod binary()#
Create a DataType representing variable-length binary data.
- Returns:
A DataType with PyArrow binary type
- Return type:
- classmethod bool()#
Create a DataType representing a boolean value.
- Returns:
A DataType with PyArrow bool type
- Return type:
- classmethod float32()#
Create a DataType representing a 32-bit floating point number.
- Returns:
A DataType with PyArrow float32 type
- Return type:
- classmethod float64()#
Create a DataType representing a 64-bit floating point number.
- Returns:
A DataType with PyArrow float64 type
- Return type:
- classmethod int16()#
Create a DataType representing a 16-bit signed integer.
- Returns:
A DataType with PyArrow int16 type
- Return type:
- classmethod int32()#
Create a DataType representing a 32-bit signed integer.
- Returns:
A DataType with PyArrow int32 type
- Return type:
- classmethod int64()#
Create a DataType representing a 64-bit signed integer.
- Returns:
A DataType with PyArrow int64 type
- Return type:
- classmethod int8()#
Create a DataType representing an 8-bit signed integer.
- Returns:
A DataType with PyArrow int8 type
- Return type:
- classmethod string()#
Create a DataType representing a variable-length string.
- Returns:
A DataType with PyArrow string type
- Return type:
- classmethod uint16()#
Create a DataType representing a 16-bit unsigned integer.
- Returns:
A DataType with PyArrow uint16 type
- Return type:
- classmethod uint32()#
Create a DataType representing a 32-bit unsigned integer.
- Returns:
A DataType with PyArrow uint32 type
- Return type:
- classmethod uint64()#
Create a DataType representing a 64-bit unsigned integer.
- Returns:
A DataType with PyArrow uint64 type
- Return type: