# ray.data.preprocessors.Normalizer#

class ray.data.preprocessors.Normalizer(columns: List[str], norm='l2')[source]#

Bases: Preprocessor

Scales each sample to have unit norm.

This preprocessor works by dividing each sample (i.e., row) by the sample’s norm. The general formula is given by

$s' = \frac{s}{\lVert s \rVert_p}$

where $$s$$ is the sample, $$s'$$ is the transformed sample, :math:lVert s rVert, and $$p$$ is the norm type.

The following norms are supported:

• "l1" ($$L^1$$): Sum of the absolute values.

• "l2" ($$L^2$$): Square root of the sum of the squared values.

• "max" ($$L^\infty$$): Maximum value.

Examples

>>> import pandas as pd
>>> import ray
>>> from ray.data.preprocessors import Normalizer
>>>
>>> df = pd.DataFrame({"X1": [1, 1], "X2": [1, 0], "X3": [0, 1]})
>>> ds = ray.data.from_pandas(df)
>>> ds.to_pandas()
X1  X2  X3
0   1   1   0
1   1   0   1


The $$L^2$$-norm of the first sample is $$\sqrt{2}$$, and the $$L^2$$-norm of the second sample is $$1$$.

>>> preprocessor = Normalizer(columns=["X1", "X2"])
>>> preprocessor.fit_transform(ds).to_pandas()
X1        X2  X3
0  0.707107  0.707107   0
1  1.000000  0.000000   1


The $$L^1$$-norm of the first sample is $$2$$, and the $$L^1$$-norm of the second sample is $$1$$.

>>> preprocessor = Normalizer(columns=["X1", "X2"], norm="l1")
>>> preprocessor.fit_transform(ds).to_pandas()
X1   X2  X3
0  0.5  0.5   0
1  1.0  0.0   1


The $$L^\infty$$-norm of the both samples is $$1$$.

>>> preprocessor = Normalizer(columns=["X1", "X2"], norm="max")
>>> preprocessor.fit_transform(ds).to_pandas()
X1   X2  X3
0  1.0  1.0   0
1  1.0  0.0   1

Parameters:
• columns – The columns to scale. For each row, these colmumns are scaled to unit-norm.

• norm – The norm to use. The supported values are "l1", "l2", or "max". Defaults to "l2".

Raises:

ValueError – if norm is not "l1", "l2", or "max".

PublicAPI (alpha): This API is in alpha and may change before becoming stable.

Methods

 deserialize Load the original preprocessor serialized via self.serialize(). fit Fit this Preprocessor to the Dataset. fit_transform Fit this Preprocessor to the Dataset and then transform the Dataset. preferred_batch_format Batch format hint for upstream producers to try yielding best block format. serialize Return this preprocessor serialized as a string. transform Transform the given dataset. transform_batch` Transform a single batch of data.