ray.data.aggregate.AggregateFnV2#

class ray.data.aggregate.AggregateFnV2(name: str, zero_factory: Callable[[], AggType], *, on: str | None, ignore_nulls: bool)[source]#

Bases: AggregateFn

Provides an interface to implement efficient aggregations to be applied to the dataset.

AggregateFnV2 instances are passed to a Dataset’s .aggregate(...) method to perform aggregations by applying distributed aggregation algorithm:

  • aggregate_block is applied to individual blocks, producing partial

    aggregations.

  • combine combines new partially aggregated value (previously returned

    from aggregate_block partial aggregations into a singular partial aggregation) with the previously stored accumulator.

  • finalize transforms partial aggregation into its final state (for

    some aggregations this is an identity transformation, ie no-op)

PublicAPI (alpha): This API is in alpha and may change before becoming stable.

Methods

aggregate_block

Applies aggregations to individual block (producing partial aggregation results)

combine

Combines new partially aggregated value (previously returned from aggregate_block partial aggregations into a singular partial aggregation) with the previously stored accumulator