ray.data.expressions.pyarrow_udf#

ray.data.expressions.pyarrow_udf(return_dtype: DataType) Callable[[...], UDFExpr][source]#

Decorator for PyArrow compute functions with automatic format conversion.

This decorator wraps arbitrary PyArrow logic to automatically convert pandas Series and numpy arrays to PyArrow Arrays, ensuring the function works seamlessly regardless of the underlying block format (pandas, arrow, or items).

The resulting UDFExpr is opaque to the optimizer – it cannot be converted to a native pyarrow.compute.Expression and therefore will not participate in predicate pushdown. Use this for operations that involve custom logic or that cannot be expressed as a single pc.* call (e.g., strip with optional characters, cast, list slicing).

For operations that are a direct 1:1 wrapper around a single pc.* function, use _create_pyarrow_compute_udf instead, which produces a PyArrowComputeUDFExpr that retains the compute function identity and enables predicate pushdown.

Parameters:

return_dtype – The data type of the return value

Returns:

A callable that creates UDFExpr instances with automatic conversion

PublicAPI (alpha): This API is in alpha and may change before becoming stable.