ray.data.Dataset.aggregate#
- Dataset.aggregate(*aggs: AggregateFn) Any | Dict[str, Any] [source]#
Aggregate values using one or more functions.
Use this method to compute metrics like the product of a column.
Note
This operation will trigger execution of the lazy transformations performed on this dataset.
Note
This operation requires all inputs to be materialized in object store for it to execute.
Examples
import ray from ray.data.aggregate import AggregateFn ds = ray.data.from_items([{"number": i} for i in range(1, 10)]) aggregation = AggregateFn( init=lambda column: 1, # Apply this to each row to produce a partial aggregate result accumulate_row=lambda a, row: a * row["number"], # Apply this to merge partial aggregate results into a final result merge=lambda a1, a2: a1 * a2, name="prod" ) print(ds.aggregate(aggregation))
{'prod': 362880}
Time complexity: O(dataset size / parallelism)
- Parameters:
*aggs –
Aggregations
to perform.- Returns:
A
dict
where each each value is an aggregation for a given column.