ray.data.expressions.pyarrow_udf#
- ray.data.expressions.pyarrow_udf(return_dtype: DataType) Callable[[...], UDFExpr][source]#
Decorator for PyArrow compute functions with automatic format conversion.
This decorator wraps arbitrary PyArrow logic to automatically convert pandas Series and numpy arrays to PyArrow Arrays, ensuring the function works seamlessly regardless of the underlying block format (pandas, arrow, or items).
The resulting UDFExpr is opaque to the optimizer – it cannot be converted to a native
pyarrow.compute.Expressionand therefore will not participate in predicate pushdown. Use this for operations that involve custom logic or that cannot be expressed as a singlepc.*call (e.g., strip with optional characters, cast, list slicing).For operations that are a direct 1:1 wrapper around a single
pc.*function, use_create_pyarrow_compute_udfinstead, which produces aPyArrowComputeUDFExprthat retains the compute function identity and enables predicate pushdown.- Parameters:
return_dtype – The data type of the return value
- Returns:
A callable that creates UDFExpr instances with automatic conversion
PublicAPI (alpha): This API is in alpha and may change before becoming stable.