ray.data.Dataset.groupby
ray.data.Dataset.groupby#
- Dataset.groupby(key: Optional[str]) GroupedData [source]#
Group the dataset by the key function or column name.
Examples
>>> import ray >>> # Group by a table column and aggregate. >>> ray.data.from_items([ ... {"A": x % 3, "B": x} for x in range(100)]).groupby( ... "A").count() Aggregate +- Dataset(num_blocks=100, num_rows=100, schema={A: int64, B: int64})
Time complexity: O(dataset size * log(dataset size / parallelism))
- Parameters
key – A column name. If this is None, the grouping is global.
- Returns
A lazy GroupedData that can be aggregated later.