ray.data.Dataset.aggregate#

Dataset.aggregate(*aggs: ray.data.aggregate._aggregate.AggregateFn) Union[Any, Dict[str, Any]][source]#

Aggregate values using one or more functions.

Use this method to compute metrics like the product of a column.

Note

This operation will trigger execution of the lazy transformations performed on this dataset.

Examples

import ray
from ray.data.aggregate import AggregateFn

ds = ray.data.from_items([{"number": i} for i in range(1, 10)])
aggregation = AggregateFn(
    init=lambda column: 1,
    accumulate_row=lambda a, row: a * row["number"],
    merge = lambda a1, a2: a1 + a2,
    name="prod"
)
print(ds.aggregate(aggregation))
{'prod': 45}

Time complexity: O(dataset size / parallelism)

Parameters

*aggsAggregations to perform.

Returns

A dict where each each value is an aggregation for a given column.