Mean#

class ray.data.aggregate.Mean(on: str | None = None, ignore_nulls: bool = True, alias_name: str | None = None)[source]#

Bases: AggregateFnV2[List[int | float], float]

Defines mean (average) aggregation.

Example

import ray
from ray.data.aggregate import Mean

ds = ray.data.range(100)
# Schema: {'id': int64}
ds = ds.add_column(
    "group_key", lambda batch: batch["id"].astype("int64") % 3
)
# Schema: {'id': int64, 'group_key': int64}

# Calculating the mean value per group:
result = ds.groupby("group_key").aggregate(Mean(on="id")).take_all()
# result: [{'group_key': 0, 'mean(id)': ...},
#          {'group_key': 1, 'mean(id)': ...},
#          {'group_key': 2, 'mean(id)': ...}]

Parameters:

on – The name of the numerical column to calculate the mean on. Must be provided.
ignore_nulls – Whether to ignore null values. If True (default), nulls are skipped. If False, the mean will be null if any value in the group is null.
alias_name – Optional name for the resulting column.

Methods

get_agg_name

Return the agg name (e.g., 'sum', 'mean', 'count').