ray.data.preprocessors.MinMaxScaler#
- class ray.data.preprocessors.MinMaxScaler(columns: List[str], output_columns: List[str] | None = None)[source]#
Bases:
Preprocessor
Scale each column by its range.
The general formula is given by
\[x' = \frac{x - \min(x)}{\max{x} - \min{x}}\]where \(x\) is the column and \(x'\) is the transformed column. If \(\max{x} - \min{x} = 0\) (i.e., the column is constant-valued), then the transformed column will get filled with zeros.
Transformed values are always in the range \([0, 1]\).
Tip
This can be used as an alternative to
StandardScaler
.Examples
>>> import pandas as pd >>> import ray >>> from ray.data.preprocessors import MinMaxScaler >>> >>> df = pd.DataFrame({"X1": [-2, 0, 2], "X2": [-3, -3, 3], "X3": [1, 1, 1]}) # noqa: E501 >>> ds = ray.data.from_pandas(df) >>> ds.to_pandas() X1 X2 X3 0 -2 -3 1 1 0 -3 1 2 2 3 1
Columns are scaled separately.
>>> preprocessor = MinMaxScaler(columns=["X1", "X2"]) >>> preprocessor.fit_transform(ds).to_pandas() X1 X2 X3 0 0.0 0.0 1 1 0.5 0.0 1 2 1.0 1.0 1
Constant-valued columns get filled with zeros.
>>> preprocessor = MinMaxScaler(columns=["X3"]) >>> preprocessor.fit_transform(ds).to_pandas() X1 X2 X3 0 -2 -3 0.0 1 0 -3 0.0 2 2 3 0.0
>>> preprocessor = MinMaxScaler(columns=["X1", "X2"], output_columns=["X1_scaled", "X2_scaled"]) >>> preprocessor.fit_transform(ds).to_pandas() X1 X2 X3 X1_scaled X2_scaled 0 -2 -3 1 0.0 0.0 1 0 -3 1 0.5 0.0 2 2 3 1 1.0 1.0
- Parameters:
columns – The columns to separately scale.
output_columns – The names of the transformed columns. If None, the transformed columns will be the same as the input columns. If not None, the length of
output_columns
must match the length ofcolumns
, othwerwise an error will be raised.
PublicAPI (alpha): This API is in alpha and may change before becoming stable.
Methods
Load the original preprocessor serialized via
self.serialize()
.Fit this Preprocessor to the Dataset.
Fit this Preprocessor to the Dataset and then transform the Dataset.
Batch format hint for upstream producers to try yielding best block format.
Return this preprocessor serialized as a string.
Transform the given dataset.
Transform a single batch of data.