ray.data.preprocessors.MaxAbsScaler#

class ray.data.preprocessors.MaxAbsScaler(columns: List[str], output_columns: List[str] | None = None)[source]#

Bases: SerializablePreprocessorBase

Scale each column by its absolute max value.

The general formula is given by

\[x' = \frac{x}{\max{\vert x \vert}}\]

where \(x\) is the column and \(x'\) is the transformed column. If \(\max{\vert x \vert} = 0\) (i.e., the column contains all zeros), then the column is unmodified.

Tip

This is the recommended way to scale sparse data. If you data isn’t sparse, you can use MinMaxScaler or StandardScaler instead.

Examples

>>> import pandas as pd
>>> import ray
>>> from ray.data.preprocessors import MaxAbsScaler
>>>
>>> df = pd.DataFrame({"X1": [-6, 3], "X2": [2, -4], "X3": [0, 0]})   # noqa: E501
>>> ds = ray.data.from_pandas(df)  
>>> ds.to_pandas()  
   X1  X2  X3
0  -6   2   0
1   3  -4   0

Columns are scaled separately.

>>> preprocessor = MaxAbsScaler(columns=["X1", "X2"])
>>> preprocessor.fit_transform(ds).to_pandas()  
    X1   X2  X3
0 -1.0  0.5   0
1  0.5 -1.0   0

Zero-valued columns aren’t scaled.

>>> preprocessor = MaxAbsScaler(columns=["X3"])
>>> preprocessor.fit_transform(ds).to_pandas()  
   X1  X2   X3
0  -6   2  0.0
1   3  -4  0.0

>>> preprocessor = MaxAbsScaler(columns=["X1", "X2"], output_columns=["X1_scaled", "X2_scaled"])
>>> preprocessor.fit_transform(ds).to_pandas()  
   X1  X2  X3  X1_scaled  X2_scaled
0  -2  -3   1       -1.0       -1.0
1   0  -3   1        0.0       -1.0
2   2   3   1        1.0        1.0

Parameters:

columns – The columns to separately scale.
output_columns – The names of the transformed columns. If None, the transformed columns will be the same as the input columns. If not None, the length of output_columns must match the length of columns, othwerwise an error will be raised.

PublicAPI (alpha): This API is in alpha and may change before becoming stable.

Methods

`deserialize`	Deserialize a preprocessor from serialized data.
`fit`	Fit this Preprocessor to the Dataset.
`fit_transform`	Fit this Preprocessor to the Dataset and then transform the Dataset.
`get_preprocessor_class_id`	Get the preprocessor class identifier for this preprocessor class.
`get_version`	Get the version number for this preprocessor class.
`preferred_batch_format`	Batch format hint for upstream producers to try yielding best block format.
`serialize`	Serialize this preprocessor to a string or bytes.
`set_preprocessor_class_id`	Set the preprocessor class identifier for this preprocessor class.
`set_version`	Set the version number for this preprocessor class.
`transform`	Transform the given dataset.
`transform_batch`	Transform a single batch of data.

Attributes

`MAGIC_CLOUDPICKLE`
`SERIALIZER_FORMAT_VERSION`
`columns`
`output_columns`
`stat_computation_plan`