ray.data.Dataset.rename_columns#

Dataset.rename_columns(names: List[str] | Dict[str, str], *, concurrency: int | Tuple[int, int] | Tuple[int, int, int] | None = None, **ray_remote_args)[source]#

Rename columns in the dataset.

Examples

>>> import ray
>>> ds = ray.data.read_parquet("s3://anonymous@ray-example-data/iris.parquet")
>>> ds.schema()
Column        Type
------        ----
sepal.length  double
sepal.width   double
petal.length  double
petal.width   double
variety       string

You can pass a dictionary mapping old column names to new column names.

>>> ds.rename_columns({"variety": "category"}).schema()
Column        Type
------        ----
sepal.length  double
sepal.width   double
petal.length  double
petal.width   double
category      string

Or you can pass a list of new column names.

>>> ds.rename_columns(
...     ["sepal_length", "sepal_width", "petal_length", "petal_width", "variety"]
... ).schema()
Column        Type
------        ----
sepal_length  double
sepal_width   double
petal_length  double
petal_width   double
variety       string

Parameters:

names – A dictionary that maps old column names to new column names, or a list of new column names.
concurrency – The maximum number of Ray workers to use concurrently.
ray_remote_args – Additional resource requirements to request from Ray (e.g., num_gpus=1 to request GPUs for the map tasks). See ray.remote() for details.