ray.data.Dataset.aggregate#

Dataset.aggregate(*aggs: AggregateFn) Any | Dict[str, Any][source]#

Aggregate values using one or more functions.

Use this method to compute metrics like the product of a column.

Note

This operation will trigger execution of the lazy transformations performed on this dataset.

Note

This operation requires all inputs to be materialized in object store for it to execute.

Examples

import ray
from ray.data.aggregate import AggregateFn

ds = ray.data.from_items([{"number": i} for i in range(1, 10)])
aggregation = AggregateFn(
    init=lambda column: 1,
    accumulate_row=lambda a, row: a * row["number"],
    merge = lambda a1, a2: a1 + a2,
    name="prod"
)
print(ds.aggregate(aggregation))
{'prod': 45}

Time complexity: O(dataset size / parallelism)

Parameters:

*aggsAggregations to perform.

Returns:

A dict where each each value is an aggregation for a given column.