ray.data.aggregate.AggregateFnV2#
- class ray.data.aggregate.AggregateFnV2(name: str, zero_factory: Callable[[], AggType], *, on: str | None, ignore_nulls: bool)[source]#
Bases:
AggregateFn
,ABC
Provides an interface to implement efficient aggregations to be applied to the dataset.
AggregateFnV2
instances are passed to a Dataset’s.aggregate(...)
method to perform distributed aggregations. To create a custom aggregation, you should subclassAggregateFnV2
and implement theaggregate_block
andcombine
methods. The_finalize
method can also be overridden if the final accumulated state needs further transformation.Aggregation follows these steps:
Initialization: For each group (if grouping) or for the entire dataset, an initial accumulator is created using
zero_factory
.Block Aggregation: The
aggregate_block
method is applied to each block independently, producing a partial aggregation result for that block.Combination: The
combine
method is used to merge these partial results (or an existing accumulated result with a new partial result) into a single, combined accumulator.Finalization: Optionally, the
_finalize
method transforms the final combined accumulator into the desired output format.
- Parameters:
name – The name of the aggregation. This will be used as the column name in the output, e.g., “sum(my_col)”.
zero_factory – A callable that returns the initial “zero” value for the accumulator. For example, for a sum, this would be
lambda: 0
; for finding a minimum,lambda: float("inf")
, for finding a maximum,lambda: float("-inf")
.on – The name of the column to perform the aggregation on. If
None
, the aggregation is performed over the entire row (e.g., forCount()
).ignore_nulls – Whether to ignore null values during aggregation. If
True
, nulls are skipped. IfFalse
, the presence of a null value might result in a null output, depending on the aggregation logic.
PublicAPI (alpha): This API is in alpha and may change before becoming stable.
Methods
Aggregates data within a single block.
Combines a new partial aggregation result with the current accumulator.
Transforms the final accumulated state into the desired output.