ray.data.Dataset.drop_columns#
- Dataset.drop_columns(cols: List[str], *, compute: str | None = None, concurrency: int | None = None, **ray_remote_args) Dataset[source]#
Drop one or more columns from the dataset.
Examples
>>> import ray >>> ds = ray.data.read_parquet("s3://anonymous@ray-example-data/iris.parquet") >>> ds.schema() Column Type ------ ---- sepal.length double sepal.width double petal.length double petal.width double variety string >>> ds.drop_columns(["variety"]).schema() Column Type ------ ---- sepal.length double sepal.width double petal.length double petal.width double
Time complexity: O(dataset size / parallelism)
- Parameters:
cols – Names of the columns to drop. If any name does not exist, an exception is raised. Column names must be unique. When the input schema is known statically, missing columns are reported at the
drop_columnscall; otherwise the error surfaces during materialization.compute – This argument is deprecated. Use
concurrencyargument.concurrency – The maximum number of Ray workers to use concurrently.
**ray_remote_args – Additional resource requirements to request from Ray (e.g., num_gpus=1 to request GPUs for the map tasks). See
ray.remote()for details.
- Returns:
A new
Datasetwith the specified columns removed.