ray.data.Dataset.groupby
ray.data.Dataset.groupby#
- Dataset.groupby(key: Optional[str]) GroupedData [source]#
Group rows of a
Dataset
according to a column.Use this method to transform data based on a categorical variable.
Examples
import pandas as pd import ray def normalize_variety(group: pd.DataFrame) -> pd.DataFrame: for feature in group.drop("variety").columns: group[feature] = group[feature] / group[feature].abs().max() return group ds = ( ray.data.read_parquet("s3://anonymous@ray-example-data/iris.parquet") .groupby("variety") .map_groups(normalize_variety, batch_format="pandas") )
Time complexity: O(dataset size * log(dataset size / parallelism))
- Parameters
key – A column name. If this is
None
, place all rows in a single group.- Returns
A lazy
GroupedData
.
See also
map_groups()
Call this method to transform groups of data.