ray.data.Dataset.std#
- Dataset.std(on: str | List[str] | None = None, ddof: int = 1, ignore_nulls: bool = True) Any | Dict[str, Any] [source]#
Compute the standard deviation of one or more columns.
Note
This method uses Welford’s online method for an accumulator-style computation of the standard deviation. This method has numerical stability, and is computable in a single pass. This may give different (but more accurate) results than NumPy, Pandas, and sklearn, which use a less numerically stable two-pass algorithm. To learn more, see the Wikapedia article.
Note
This operation will trigger execution of the lazy transformations performed on this dataset.
Note
This operation requires all inputs to be materialized in object store for it to execute.
Examples
>>> import ray >>> round(ray.data.range(100).std("id", ddof=0), 5) 28.86607 >>> ray.data.from_items([ ... {"A": i, "B": i**2} ... for i in range(100) ... ]).std(["A", "B"]) {'std(A)': 29.011491975882016, 'std(B)': 2968.1748039269296}
- Parameters:
on – a column name or a list of column names to aggregate.
ddof – Delta Degrees of Freedom. The divisor used in calculations is
N - ddof
, whereN
represents the number of elements.ignore_nulls – Whether to ignore null values. If
True
, null values are ignored when computing the std; ifFalse
, when a null value is encountered, the output isNone
. This method considersnp.nan
,None
, andpd.NaT
to be null values. Default isTrue
.
- Returns:
The standard deviation result.
For different values of
on
, the return varies:on=None
: an dict containing the column-wise std of all columns,on="col"
: a scalar representing the std of all items in column"col"
,on=["col_1", ..., "col_n"]
: an n-column dict containing the column-wise std of the provided columns.
If the dataset is empty, all values are null. If
ignore_nulls
isFalse
and any value is null, then the output isNone
.