# ray.data.grouped_data.GroupedData.std#

GroupedData.std(on: str | List[str] = None, ddof: int = 1, ignore_nulls: bool = True) [source]#

Compute grouped standard deviation aggregation.

Examples

```>>> import ray
>>> ray.data.range(100).groupby("id").std(ddof=0)
>>> ray.data.from_items([
...     {"A": i % 3, "B": i, "C": i**2}
...     for i in range(100)]) \
...     .groupby("A") \
...     .std(["B", "C"])
```

NOTE: This uses Welford’s online method for an accumulator-style computation of the standard deviation. This method was chosen due to it’s numerical stability, and it being computable in a single pass. This may give different (but more accurate) results than NumPy, Pandas, and sklearn, which use a less numerically stable two-pass algorithm. See https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Welford’s_online_algorithm

Parameters:
• on – a column name or a list of column names to aggregate.

• ddof – Delta Degrees of Freedom. The divisor used in calculations is `N - ddof`, where `N` represents the number of elements.

• ignore_nulls – Whether to ignore null values. If `True`, null values will be ignored when computing the std; if `False`, if a null value is encountered, the output will be null. We consider np.nan, None, and pd.NaT to be null values. Default is `True`.

Returns:

The standard deviation result.

For different values of `on`, the return varies:

• `on=None`: a dataset containing a groupby key column, `"k"`, and a column-wise std column for each original column in the dataset.

• `on=["col_1", ..., "col_n"]`: a dataset of `n + 1` columns where the first column is the groupby key and the second through `n + 1` columns are the results of the aggregations.

If groupby key is `None` then the key part of return is omitted.