ray.data.expressions.udf#
- ray.data.expressions.udf() Callable[[...], UDFExpr] [source]#
Decorator to convert a UDF into an expression-compatible function.
This decorator allows UDFs to be used seamlessly within the expression system, enabling schema inference and integration with other expressions.
IMPORTANT: UDFs operate on batches of data, not individual rows. When your UDF is called, each column argument will be passed as a PyArrow Array containing multiple values from that column across the batch. Under the hood, when working with multiple columns, they get translated to PyArrow arrays (one array per column).
- Returns:
A callable that creates UDFExpr instances when called with expressions
Example
>>> from ray.data.expressions import col, udf >>> import pyarrow as pa >>> import pyarrow.compute as pc >>> import ray >>> >>> # UDF that operates on a batch of values (PyArrow Array) >>> @udf() ... def add_one(x: pa.Array) -> pa.Array: ... return pc.add(x, 1) # Vectorized operation on the entire Array >>> >>> # UDF that combines multiple columns (each as a PyArrow Array) >>> @udf() ... def format_name(first: pa.Array, last: pa.Array) -> pa.Array: ... return pc.binary_join_element_wise(first, last, " ") # Vectorized string concatenation >>> >>> # Use in dataset operations >>> ds = ray.data.from_items([ ... {"value": 5, "first": "John", "last": "Doe"}, ... {"value": 10, "first": "Jane", "last": "Smith"} ... ]) >>> >>> # Single column transformation (operates on batches) >>> ds_incremented = ds.with_column("value_plus_one", add_one(col("value"))) >>> >>> # Multi-column transformation (each column becomes a PyArrow Array) >>> ds_formatted = ds.with_column("full_name", format_name(col("first"), col("last"))) >>> >>> # Can also be used in complex expressions >>> ds_complex = ds.with_column("doubled_plus_one", add_one(col("value")) * 2)
PublicAPI (alpha): This API is in alpha and may change before becoming stable.