ray.data.Dataset.with_column#
- Dataset.with_column(column_name: str, expr: Expr, **ray_remote_args) Dataset [source]#
Add a new column to the dataset via an expression.
This method allows you to add a new column to a dataset by applying an expression. The expression can be composed of existing columns, literals, and user-defined functions (UDFs).
Examples
>>> import ray >>> from ray.data.expressions import col >>> ds = ray.data.range(100) >>> # Add a new column 'id_2' by multiplying 'id' by 2. >>> ds.with_column("id_2", col("id") * 2).show(2) {'id': 0, 'id_2': 0} {'id': 1, 'id_2': 2}
>>> # Using a UDF with with_column >>> from ray.data.datatype import DataType >>> from ray.data.expressions import udf >>> import pyarrow.compute as pc >>> >>> @udf(return_dtype=DataType.int32()) ... def add_one(column): ... return pc.add(column, 1) >>> >>> ds.with_column("id_plus_one", add_one(col("id"))).show(2) {'id': 0, 'id_plus_one': 1} {'id': 1, 'id_plus_one': 2}
- Parameters:
column_name – The name of the new column.
expr – An expression that defines the new column values.
**ray_remote_args – Additional resource requirements to request from Ray for the map tasks (e.g.,
num_gpus=1
).
- Returns:
A new dataset with the added column evaluated via the expression.
PublicAPI (alpha): This API is in alpha and may change before becoming stable.