ray.data.Dataset.drop_columns
ray.data.Dataset.drop_columns#
- Dataset.drop_columns(cols: List[str], *, compute: Optional[str] = None, **ray_remote_args) ray.data.dataset.Dataset [source]#
Drop one or more columns from the dataset.
Examples
>>> import ray >>> ds = ray.data.range(100) >>> # Add a new column equal to value * 2. >>> ds = ds.add_column("new_col", lambda df: df["id"] * 2) >>> # Drop the existing "value" column. >>> ds = ds.drop_columns(["id"])
Time complexity: O(dataset size / parallelism)
- Parameters
cols – Names of the columns to drop. If any name does not exist, an exception will be raised.
compute – The compute strategy, either “tasks” (default) to use Ray tasks,
ray.data.ActorPoolStrategy(size=n)
to use a fixed-size actor pool, orray.data.ActorPoolStrategy(min_size=m, max_size=n)
for an autoscaling actor pool.ray_remote_args – Additional resource requirements to request from ray (e.g., num_gpus=1 to request GPUs for the map tasks).