ray.data.aggregate.AggregateFnV2#
- class ray.data.aggregate.AggregateFnV2(name: str, zero_factory: Callable[[], AggType], *, on: str | None, ignore_nulls: bool)[source]#
Bases:
AggregateFn,ABCProvides an interface to implement efficient aggregations to be applied to the dataset.
AggregateFnV2instances are passed to a Dataset’s.aggregate(...)method to perform distributed aggregations. To create a custom aggregation, you should subclassAggregateFnV2and implement theaggregate_blockandcombinemethods. Thefinalizemethod can also be overridden if the final accumulated state needs further transformation.Aggregation follows these steps:
Initialization: For each group (if grouping) or for the entire dataset, an initial accumulator is created using
zero_factory.Block Aggregation: The
aggregate_blockmethod is applied to each block independently, producing a partial aggregation result for that block.Combination: The
combinemethod is used to merge these partial results (or an existing accumulated result with a new partial result) into a single, combined accumulator.Finalization: Optionally, the
finalizemethod transforms the final combined accumulator into the desired output format.
- Parameters:
name – The name of the aggregation. This will be used as the column name in the output, e.g., “sum(my_col)”.
zero_factory – A callable that returns the initial “zero” value for the accumulator. For example, for a sum, this would be
lambda: 0; for finding a minimum,lambda: float("inf"), for finding a maximum,lambda: float("-inf").on – The name of the column to perform the aggregation on. If
None, the aggregation is performed over the entire row (e.g., forCount()).ignore_nulls – Whether to ignore null values during aggregation. If
True, nulls are skipped. IfFalse, the presence of a null value might result in a null output, depending on the aggregation logic.
PublicAPI (alpha): This API is in alpha and may change before becoming stable.
Methods
Aggregates data within a single block.
Combines a new partial aggregation result with the current accumulator.
Transforms the final accumulated state into the desired output.