# ray.data.aggregate.Std#

class ray.data.aggregate.Std(on: Union[None, str, Callable[[ray.data.block.T], Any]] = None, ddof: int = 1, ignore_nulls: bool = True)[source]#

Defines standard deviation aggregation.

Uses Welford’s online method for an accumulator-style computation of the standard deviation. This method was chosen due to its numerical stability, and it being computable in a single pass. This may give different (but more accurate) results than NumPy, Pandas, and sklearn, which use a less numerically stable two-pass algorithm. See https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Welford’s_online_algorithm

PublicAPI: This API is stable across Ray releases.

__init__(on: Union[None, str, Callable[[ray.data.block.T], Any]] = None, ddof: int = 1, ignore_nulls: bool = True)[source]#

Defines an aggregate function in the accumulator style.

Aggregates a collection of inputs of type T into a single output value of type U. See https://www.sigops.org/s/conferences/sosp/2009/papers/yu-sosp09.pdf for more details about accumulator-based aggregation.

Parameters
• init – This is called once for each group to return the empty accumulator. For example, an empty accumulator for a sum would be 0.

• merge – This may be called multiple times, each time to merge two accumulators into one.

• accumulate_row – This is called once per row of the same group. This combines the accumulator and the row, returns the updated accumulator. Exactly one of accumulate_row and accumulate_block must be provided.

• accumulate_block – This is used to calculate the aggregation for a single block, and is vectorized alternative to accumulate_row. This will be given a base accumulator and the entire block, allowing for vectorized accumulation of the block. Exactly one of accumulate_row and accumulate_block must be provided.

• finalize – This is called once to compute the final aggregation result from the fully merged accumulator.

• name – The name of the aggregation. This will be used as the output column name in the case of Arrow dataset.

Methods

 `__init__`([on, ddof, ignore_nulls]) Defines an aggregate function in the accumulator style.