ray.data.Dataset.groupby
ray.data.Dataset.groupby#
- Dataset.groupby(key: Union[None, str, Callable[[ray.data.block.T], Any]]) GroupedDataset[T] [source]#
Group the dataset by the key function or column name.
Examples
>>> import ray >>> # Group by a key function and aggregate. >>> ray.data.range(100).groupby(lambda x: x % 3).count() Aggregate +- Dataset(num_blocks=..., num_rows=100, schema=<class 'int'>) >>> # Group by an Arrow table column and aggregate. >>> ray.data.from_items([ ... {"A": x % 3, "B": x} for x in range(100)]).groupby( ... "A").count() Aggregate +- Dataset(num_blocks=100, num_rows=100, schema={A: int64, B: int64})
Time complexity: O(dataset size * log(dataset size / parallelism))
- Parameters
key – A key function or Arrow column name. If this is None, the grouping is global.
- Returns
A lazy GroupedDataset that can be aggregated later.