ray.data.Dataset.with_column#

Dataset.with_column(column_name: str, expr: Expr, **ray_remote_args) Dataset[source]#

Add a new column to the dataset via an expression.

This method allows you to add a new column to a dataset by applying an expression. The expression can be composed of existing columns, literals, and user-defined functions (UDFs).

Examples

>>> import ray
>>> from ray.data.expressions import col
>>> ds = ray.data.range(100)
>>> # Add a new column 'id_2' by multiplying 'id' by 2.
>>> ds.with_column("id_2", col("id") * 2).show(2)
{'id': 0, 'id_2': 0}
{'id': 1, 'id_2': 2}
>>> # Using a UDF with with_column
>>> from ray.data.datatype import DataType
>>> from ray.data.expressions import udf
>>> import pyarrow.compute as pc
>>>
>>> @udf(return_dtype=DataType.int32())
... def add_one(column):
...     return pc.add(column, 1)
>>>
>>> ds.with_column("id_plus_one", add_one(col("id"))).show(2)
{'id': 0, 'id_plus_one': 1}
{'id': 1, 'id_plus_one': 2}
Parameters:
  • column_name – The name of the new column.

  • expr – An expression that defines the new column values.

  • **ray_remote_args – Additional resource requirements to request from Ray for the map tasks (e.g., num_gpus=1).

Returns:

A new dataset with the added column evaluated via the expression.

PublicAPI (alpha): This API is in alpha and may change before becoming stable.