ray.data.preprocessors.MinMaxScaler#
- class ray.data.preprocessors.MinMaxScaler(columns: List[str], output_columns: List[str] | None = None)[source]#
- Bases: - Preprocessor- Scale each column by its range. - The general formula is given by \[x' = \frac{x - \min(x)}{\max{x} - \min{x}}\]- where \(x\) is the column and \(x'\) is the transformed column. If \(\max{x} - \min{x} = 0\) (i.e., the column is constant-valued), then the transformed column will get filled with zeros. - Transformed values are always in the range \([0, 1]\). - Tip - This can be used as an alternative to - StandardScaler.- Examples - >>> import pandas as pd >>> import ray >>> from ray.data.preprocessors import MinMaxScaler >>> >>> df = pd.DataFrame({"X1": [-2, 0, 2], "X2": [-3, -3, 3], "X3": [1, 1, 1]}) # noqa: E501 >>> ds = ray.data.from_pandas(df) >>> ds.to_pandas() X1 X2 X3 0 -2 -3 1 1 0 -3 1 2 2 3 1 - Columns are scaled separately. - >>> preprocessor = MinMaxScaler(columns=["X1", "X2"]) >>> preprocessor.fit_transform(ds).to_pandas() X1 X2 X3 0 0.0 0.0 1 1 0.5 0.0 1 2 1.0 1.0 1 - Constant-valued columns get filled with zeros. - >>> preprocessor = MinMaxScaler(columns=["X3"]) >>> preprocessor.fit_transform(ds).to_pandas() X1 X2 X3 0 -2 -3 0.0 1 0 -3 0.0 2 2 3 0.0 - >>> preprocessor = MinMaxScaler(columns=["X1", "X2"], output_columns=["X1_scaled", "X2_scaled"]) >>> preprocessor.fit_transform(ds).to_pandas() X1 X2 X3 X1_scaled X2_scaled 0 -2 -3 1 0.0 0.0 1 0 -3 1 0.5 0.0 2 2 3 1 1.0 1.0 - Parameters:
- columns – The columns to separately scale. 
- output_columns – The names of the transformed columns. If None, the transformed columns will be the same as the input columns. If not None, the length of - output_columnsmust match the length of- columns, othwerwise an error will be raised.
 
 - PublicAPI (alpha): This API is in alpha and may change before becoming stable. - Methods - Load the original preprocessor serialized via - self.serialize().- Fit this Preprocessor to the Dataset. - Fit this Preprocessor to the Dataset and then transform the Dataset. - Batch format hint for upstream producers to try yielding best block format. - Return this preprocessor serialized as a string. - Transform the given dataset. - Transform a single batch of data.