ray.data.preprocessors.MinMaxScaler#

class ray.data.preprocessors.MinMaxScaler(columns: List[str])[source]#

Bases: Preprocessor

Scale each column by its range.

The general formula is given by

\[x' = \frac{x - \min(x)}{\max{x} - \min{x}}\]

where \(x\) is the column and \(x'\) is the transformed column. If \(\max{x} - \min{x} = 0\) (i.e., the column is constant-valued), then the transformed column will get filled with zeros.

Transformed values are always in the range \([0, 1]\).

Tip

This can be used as an alternative to StandardScaler.

Examples

>>> import pandas as pd
>>> import ray
>>> from ray.data.preprocessors import MinMaxScaler
>>>
>>> df = pd.DataFrame({"X1": [-2, 0, 2], "X2": [-3, -3, 3], "X3": [1, 1, 1]})   # noqa: E501
>>> ds = ray.data.from_pandas(df)  
>>> ds.to_pandas()  
   X1  X2  X3
0  -2  -3   1
1   0  -3   1
2   2   3   1

Columns are scaled separately.

>>> preprocessor = MinMaxScaler(columns=["X1", "X2"])
>>> preprocessor.fit_transform(ds).to_pandas()  
    X1   X2  X3
0  0.0  0.0   1
1  0.5  0.0   1
2  1.0  1.0   1

Constant-valued columns get filled with zeros.

>>> preprocessor = MinMaxScaler(columns=["X3"])
>>> preprocessor.fit_transform(ds).to_pandas()  
   X1  X2   X3
0  -2  -3  0.0
1   0  -3  0.0
2   2   3  0.0
Parameters:

columns – The columns to separately scale.

PublicAPI (alpha): This API is in alpha and may change before becoming stable.

Methods

deserialize

Load the original preprocessor serialized via self.serialize().

fit

Fit this Preprocessor to the Dataset.

fit_transform

Fit this Preprocessor to the Dataset and then transform the Dataset.

preferred_batch_format

Batch format hint for upstream producers to try yielding best block format.

serialize

Return this preprocessor serialized as a string.

transform

Transform the given dataset.

transform_batch

Transform a single batch of data.

transform_stats

Return Dataset stats for the most recent transform call, if any.