ray.data.preprocessors.Normalizer
ray.data.preprocessors.Normalizer#
- class ray.data.preprocessors.Normalizer(columns: List[str], norm='l2')[source]#
Bases:
ray.data.preprocessor.Preprocessor
Scales each sample to have unit norm.
This preprocessor works by dividing each sample (i.e., row) by the sample’s norm. The general formula is given by
\[s' = \frac{s}{\lVert s \rVert_p}\]where \(s\) is the sample, \(s'\) is the transformed sample, :math:lVert s rVert`, and \(p\) is the norm type.
The following norms are supported:
"l1"
(\(L^1\)): Sum of the absolute values."l2"
(\(L^2\)): Square root of the sum of the squared values."max"
(\(L^\infty\)): Maximum value.
Examples
>>> import pandas as pd >>> import ray >>> from ray.data.preprocessors import Normalizer >>> >>> df = pd.DataFrame({"X1": [1, 1], "X2": [1, 0], "X3": [0, 1]}) >>> ds = ray.data.from_pandas(df) >>> ds.to_pandas() X1 X2 X3 0 1 1 0 1 1 0 1
The \(L^2\)-norm of the first sample is \(\sqrt{2}\), and the \(L^2\)-norm of the second sample is \(1\).
>>> preprocessor = Normalizer(columns=["X1", "X2"]) >>> preprocessor.fit_transform(ds).to_pandas() X1 X2 X3 0 0.707107 0.707107 0 1 1.000000 0.000000 1
The \(L^1\)-norm of the first sample is \(2\), and the \(L^1\)-norm of the second sample is \(1\).
>>> preprocessor = Normalizer(columns=["X1", "X2"], norm="l1") >>> preprocessor.fit_transform(ds).to_pandas() X1 X2 X3 0 0.5 0.5 0 1 1.0 0.0 1
The \(L^\infty\)-norm of the both samples is \(1\).
>>> preprocessor = Normalizer(columns=["X1", "X2"], norm="max") >>> preprocessor.fit_transform(ds).to_pandas() X1 X2 X3 0 1.0 1.0 0 1 1.0 0.0 1
- Parameters
columns – The columns to scale. For each row, these colmumns are scaled to unit-norm.
norm – The norm to use. The supported values are
"l1"
,"l2"
, or"max"
. Defaults to"l2"
.
- Raises
ValueError – if
norm
is not"l1"
,"l2"
, or"max"
.
PublicAPI (alpha): This API is in alpha and may change before becoming stable.