ray.data.Dataset.groupby#
- Dataset.groupby(key: str | List[str] | None) GroupedData [source]#
Group rows of a
Dataset
according to a column.Use this method to transform data based on a categorical variable.
Note
This operation requires all inputs to be materialized in object store for it to execute.
Examples
import pandas as pd import ray def normalize_variety(group: pd.DataFrame) -> pd.DataFrame: for feature in group.drop("variety").columns: group[feature] = group[feature] / group[feature].abs().max() return group ds = ( ray.data.read_parquet("s3://anonymous@ray-example-data/iris.parquet") .groupby("variety") .map_groups(normalize_variety, batch_format="pandas") )
Time complexity: O(dataset size * log(dataset size / parallelism))
- Parameters:
key – A column name or list of column names.
None (If this is) –
group. (place all rows in a single) –
- Returns:
A lazy
GroupedData
.
See also
map_groups()
Call this method to transform groups of data.