ray.data.aggregate.Std#
- class ray.data.aggregate.Std(on: str | None = None, ddof: int = 1, ignore_nulls: bool = True, alias_name: str | None = None)[source]#
Bases:
AggregateFnV2
Defines standard deviation aggregation.
Uses Welford’s online algorithm for numerical stability. This method computes the standard deviation in a single pass. Results may differ slightly from libraries like NumPy or Pandas that use a two-pass algorithm but are generally more accurate.
See: https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Welford’s_online_algorithm
Example
import ray from ray.data.aggregate import Std ds = ray.data.range(100) # Schema: {'id': int64} ds = ds.add_column("group_key", lambda x: x % 3) # Schema: {'id': int64, 'group_key': int64} # Calculating the standard deviation per group: result = ds.groupby("group_key").aggregate(Std(on="id")).take_all() # result: [{'group_key': 0, 'std(id)': ...}, # {'group_key': 1, 'std(id)': ...}, # {'group_key': 2, 'std(id)': ...}]
- Parameters:
on – The name of the column to calculate standard deviation on.
ddof – Delta Degrees of Freedom. The divisor used in calculations is
N - ddof
, whereN
is the number of elements. Default is 1.ignore_nulls – Whether to ignore null values. Default is True.
alias_name – Optional name for the resulting column.
Methods