ray.data.preprocessors.Normalizer#
- class ray.data.preprocessors.Normalizer(columns: List[str], norm='l2', *, output_columns: List[str] | None = None)[source]#
Bases:
Preprocessor
Scales each sample to have unit norm.
This preprocessor works by dividing each sample (i.e., row) by the sample’s norm. The general formula is given by
\[s' = \frac{s}{\lVert s \rVert_p}\]where \(s\) is the sample, \(s'\) is the transformed sample, :math:lVert s rVert`, and \(p\) is the norm type.
The following norms are supported:
"l1"
(\(L^1\)): Sum of the absolute values."l2"
(\(L^2\)): Square root of the sum of the squared values."max"
(\(L^\infty\)): Maximum value.
Examples
>>> import pandas as pd >>> import ray >>> from ray.data.preprocessors import Normalizer >>> >>> df = pd.DataFrame({"X1": [1, 1], "X2": [1, 0], "X3": [0, 1]}) >>> ds = ray.data.from_pandas(df) >>> ds.to_pandas() X1 X2 X3 0 1 1 0 1 1 0 1
The \(L^2\)-norm of the first sample is \(\sqrt{2}\), and the \(L^2\)-norm of the second sample is \(1\).
>>> preprocessor = Normalizer(columns=["X1", "X2"]) >>> preprocessor.fit_transform(ds).to_pandas() X1 X2 X3 0 0.707107 0.707107 0 1 1.000000 0.000000 1
The \(L^1\)-norm of the first sample is \(2\), and the \(L^1\)-norm of the second sample is \(1\).
>>> preprocessor = Normalizer(columns=["X1", "X2"], norm="l1") >>> preprocessor.fit_transform(ds).to_pandas() X1 X2 X3 0 0.5 0.5 0 1 1.0 0.0 1
The \(L^\infty\)-norm of the both samples is \(1\).
>>> preprocessor = Normalizer(columns=["X1", "X2"], norm="max") >>> preprocessor.fit_transform(ds).to_pandas() X1 X2 X3 0 1.0 1.0 0 1 1.0 0.0 1
Normalizer
can also be used in append mode by providing the name of the output_columns that should hold the normalized values.>>> preprocessor = Normalizer(columns=["X1", "X2"], output_columns=["X1_normalized", "X2_normalized"]) >>> preprocessor.fit_transform(ds).to_pandas() X1 X2 X3 X1_normalized X2_normalized 0 1 1 0 0.707107 0.707107 1 1 0 1 1.000000 0.000000
- Parameters:
columns – The columns to scale. For each row, these colmumns are scaled to unit-norm.
norm – The norm to use. The supported values are
"l1"
,"l2"
, or"max"
. Defaults to"l2"
.output_columns – The names of the transformed columns. If None, the transformed columns will be the same as the input columns. If not None, the length of
output_columns
must match the length ofcolumns
, othwerwise an error will be raised.
- Raises:
ValueError – if
norm
is not"l1"
,"l2"
, or"max"
.
PublicAPI (alpha): This API is in alpha and may change before becoming stable.
Methods
Load the original preprocessor serialized via
self.serialize()
.Fit this Preprocessor to the Dataset.
Fit this Preprocessor to the Dataset and then transform the Dataset.
Batch format hint for upstream producers to try yielding best block format.
Return this preprocessor serialized as a string.
Transform the given dataset.
Transform a single batch of data.