ray.data.aggregate.AggregateFnV2#

class ray.data.aggregate.AggregateFnV2(name: str, zero_factory: Callable[[], AggType], *, on: str | None, ignore_nulls: bool)[source]#

Bases: AggregateFn, ABC

Provides an interface to implement efficient aggregations to be applied to the dataset.

AggregateFnV2 instances are passed to a Dataset’s .aggregate(...) method to perform distributed aggregations. To create a custom aggregation, you should subclass AggregateFnV2 and implement the aggregate_block and combine methods. The finalize method can also be overridden if the final accumulated state needs further transformation.

Aggregation follows these steps:

Initialization: For each group (if grouping) or for the entire dataset, an initial accumulator is created using zero_factory.
Block Aggregation: The aggregate_block method is applied to each block independently, producing a partial aggregation result for that block.
Combination: The combine method is used to merge these partial results (or an existing accumulated result with a new partial result) into a single, combined accumulator.
Finalization: Optionally, the finalize method transforms the final combined accumulator into the desired output format.

Parameters:

name – The name of the aggregation. This will be used as the column name in the output, e.g., “sum(my_col)”.
zero_factory – A callable that returns the initial “zero” value for the accumulator. For example, for a sum, this would be lambda: 0; for finding a minimum, lambda: float("inf"), for finding a maximum, lambda: float("-inf").
on – The name of the column to perform the aggregation on. If None, the aggregation is performed over the entire row (e.g., for Count()).
ignore_nulls – Whether to ignore null values during aggregation. If True, nulls are skipped. If False, the presence of a null value might result in a null output, depending on the aggregation logic.

PublicAPI (alpha): This API is in alpha and may change before becoming stable.

Methods

`aggregate_block`	Aggregates data within a single block.
`combine`	Combines a new partial aggregation result with the current accumulator.
`finalize`	Transforms the final accumulated state into the desired output.