ray.data.Dataset.aggregate#

Dataset.aggregate(*aggs: AggregateFn) Any | Dict[str, Any][source]#

Aggregate values using one or more functions.

Use this method to compute metrics like the product of a column.

Note

This operation will trigger execution of the lazy transformations performed on this dataset.

Note

This operation requires all inputs to be materialized in object store for it to execute.

Examples

import ray
from ray.data.aggregate import AggregateFn

ds = ray.data.from_items([{"number": i} for i in range(1, 10)])
aggregation = AggregateFn(
    init=lambda column: 1,
    # Apply this to each row to produce a partial aggregate result
    accumulate_row=lambda a, row: a * row["number"],
    # Apply this to merge partial aggregate results into a final result
    merge=lambda a1, a2: a1 * a2,
    name="prod"
)
print(ds.aggregate(aggregation))
{'prod': 362880}

Time complexity: O(dataset size / parallelism)

Parameters:

*aggsAggregations to perform.

Returns:

A dict where each each value is an aggregation for a given column.