ray.data.preprocessors.RobustScaler#
- class ray.data.preprocessors.RobustScaler(columns: List[str], quantile_range: Tuple[float, float] = (0.25, 0.75))[source]#
Bases:
Preprocessor
Scale and translate each column using quantiles.
The general formula is given by
\[x' = \frac{x - \mu_{1/2}}{\mu_h - \mu_l}\]where \(x\) is the column, \(x'\) is the transformed column, \(\mu_{1/2}\) is the column median. \(\mu_{h}\) and \(\mu_{l}\) are the high and low quantiles, respectively. By default, \(\mu_{h}\) is the third quartile and \(\mu_{l}\) is the first quartile.
Tip
This scaler works well when your data contains many outliers.
Examples
>>> import pandas as pd >>> import ray >>> from ray.data.preprocessors import RobustScaler >>> >>> df = pd.DataFrame({ ... "X1": [1, 2, 3, 4, 5], ... "X2": [13, 5, 14, 2, 8], ... "X3": [1, 2, 2, 2, 3], ... }) >>> ds = ray.data.from_pandas(df) >>> ds.to_pandas() X1 X2 X3 0 1 13 1 1 2 5 2 2 3 14 2 3 4 2 2 4 5 8 3
RobustScaler
separately scales each column.>>> preprocessor = RobustScaler(columns=["X1", "X2"]) >>> preprocessor.fit_transform(ds).to_pandas() X1 X2 X3 0 -1.0 0.625 1 1 -0.5 -0.375 2 2 0.0 0.750 2 3 0.5 -0.750 2 4 1.0 0.000 3
- Parameters:
columns – The columns to separately scale.
quantile_range – A tuple that defines the lower and upper quantiles. Values must be between 0 and 1. Defaults to the 1st and 3rd quartiles:
(0.25, 0.75)
.
PublicAPI (alpha): This API is in alpha and may change before becoming stable.
Methods
Load the original preprocessor serialized via
self.serialize()
.Fit this Preprocessor to the Dataset.
Fit this Preprocessor to the Dataset and then transform the Dataset.
Batch format hint for upstream producers to try yielding best block format.
Return this preprocessor serialized as a string.
Transform the given dataset.
Transform a single batch of data.