ray.data.Dataset.drop_columns#

Dataset.drop_columns(cols: List[str], *, compute: Optional[str] = None, **ray_remote_args) ray.data.dataset.Dataset[ray.data.block.U][source]#

Drop one or more columns from the dataset.

This is a blocking operation.

Examples

>>> import ray
>>> ds = ray.data.range_table(100)
>>> # Add a new column equal to value * 2.
>>> ds = ds.add_column(
...     "new_col", lambda df: df["value"] * 2)
>>> # Drop the existing "value" column.
>>> ds = ds.drop_columns(["value"])

Time complexity: O(dataset size / parallelism)

Parameters
  • cols – Names of the columns to drop. If any name does not exist, an exception will be raised.

  • compute – The compute strategy, either β€œtasks” (default) to use Ray tasks, or ActorPoolStrategy(min, max) to use an autoscaling actor pool.

  • ray_remote_args – Additional resource requirements to request from ray (e.g., num_gpus=1 to request GPUs for the map tasks).