ray.data.Dataset.aggregate#

Dataset.aggregate(*aggs: ray.data.aggregate.AggregateFn) ray.data.block.U[source]#

Aggregate the entire dataset as one group.

This is a blocking operation.

Examples

>>> import ray
>>> from ray.data.aggregate import Max, Mean
>>> ray.data.range(100).aggregate(Max())
(99,)
>>> ray.data.range_table(100).aggregate(
...    Max("value"), Mean("value"))
{'max(value)': 99, 'mean(value)': 49.5}

Time complexity: O(dataset size / parallelism)

Parameters

aggs – Aggregations to do.

Returns

If the input dataset is a simple dataset then the output is a tuple of (agg1, agg2, ...) where each tuple element is the corresponding aggregation result. If the input dataset is an Arrow dataset then the output is an ArrowRow where each column is the corresponding aggregation result. If the dataset is empty, return None.