ray.data.preprocessors.RobustScaler
ray.data.preprocessors.RobustScaler#
- class ray.data.preprocessors.RobustScaler(columns: List[str], quantile_range: Tuple[float, float] = (0.25, 0.75))[source]#
Bases:
ray.data.preprocessor.Preprocessor
Scale and translate each column using quantiles.
The general formula is given by
\[x' = \frac{x - \mu_{1/2}}{\mu_h - \mu_l}\]where \(x\) is the column, \(x'\) is the transformed column, \(\mu_{1/2}\) is the column median. \(\mu_{h}\) and \(\mu_{l}\) are the high and low quantiles, respectively. By default, \(\mu_{h}\) is the third quartile and \(\mu_{l}\) is the first quartile.
Tip
This scaler works well when your data contains many outliers.
Examples
>>> import pandas as pd >>> import ray >>> from ray.data.preprocessors import RobustScaler >>> >>> df = pd.DataFrame({ ... "X1": [1, 2, 3, 4, 5], ... "X2": [13, 5, 14, 2, 8], ... "X3": [1, 2, 2, 2, 3], ... }) >>> ds = ray.data.from_pandas(df) >>> ds.to_pandas() X1 X2 X3 0 1 13 1 1 2 5 2 2 3 14 2 3 4 2 2 4 5 8 3
RobustScaler
separately scales each column.>>> preprocessor = RobustScaler(columns=["X1", "X2"]) >>> preprocessor.fit_transform(ds).to_pandas() X1 X2 X3 0 -1.0 0.625 1 1 -0.5 -0.375 2 2 0.0 0.750 2 3 0.5 -0.750 2 4 1.0 0.000 3
- Parameters
columns – The columns to separately scale.
quantile_range – A tuple that defines the lower and upper quantiles. Values must be between 0 and 1. Defaults to the 1st and 3rd quartiles:
(0.25, 0.75)
.
PublicAPI (alpha): This API is in alpha and may change before becoming stable.
Methods
fit
(ds)Fit this Preprocessor to the Dataset.
fit_transform
(ds)Fit this Preprocessor to the Dataset and then transform the Dataset.
Batch format hint for upstream producers to try yielding best block format.
transform
(ds)Transform the given dataset.
transform_batch
(data)Transform a single batch of data.
Return Dataset stats for the most recent transform call, if any.