ray.data.preprocessors.StandardScaler
ray.data.preprocessors.StandardScaler#
- class ray.data.preprocessors.StandardScaler(columns: List[str])[source]#
Bases:
ray.data.preprocessor.Preprocessor
Translate and scale each column by its mean and standard deviation, respectively.
The general formula is given by
\[x' = \frac{x - \bar{x}}{s}\]where \(x\) is the column, \(x'\) is the transformed column, \(\bar{x}\) is the column average, and \(s\) is the column’s sample standard deviation. If \(s = 0\) (i.e., the column is constant-valued), then the transformed column will contain zeros.
Warning
StandardScaler
works best when your data is normal. If your data isn’t approximately normal, then the transformed features won’t be meaningful.Examples
>>> import pandas as pd >>> import ray >>> from ray.data.preprocessors import StandardScaler >>> >>> df = pd.DataFrame({"X1": [-2, 0, 2], "X2": [-3, -3, 3], "X3": [1, 1, 1]}) >>> ds = ray.data.from_pandas(df) >>> ds.to_pandas() X1 X2 X3 0 -2 -3 1 1 0 -3 1 2 2 3 1
Columns are scaled separately.
>>> preprocessor = StandardScaler(columns=["X1", "X2"]) >>> preprocessor.fit_transform(ds).to_pandas() X1 X2 X3 0 -1.224745 -0.707107 1 1 0.000000 -0.707107 1 2 1.224745 1.414214 1
Constant-valued columns get filled with zeros.
>>> preprocessor = StandardScaler(columns=["X3"]) >>> preprocessor.fit_transform(ds).to_pandas() X1 X2 X3 0 -2 -3 0.0 1 0 -3 0.0 2 2 3 0.0
- Parameters
columns – The columns to separately scale.
PublicAPI (alpha): This API is in alpha and may change before becoming stable.