ray.data.preprocessors.Chain#
- class ray.data.preprocessors.Chain(*preprocessors: Preprocessor)[source]#
Bases:
Preprocessor
Combine multiple preprocessors into a single
Preprocessor
.When you call
fit
, each preprocessor is fit on the dataset produced by the preceeding preprocessor’sfit_transform
.Example
>>> import pandas as pd >>> import ray >>> from ray.data.preprocessors import * >>> >>> df = pd.DataFrame({ ... "X0": [0, 1, 2], ... "X1": [3, 4, 5], ... "Y": ["orange", "blue", "orange"], ... }) >>> ds = ray.data.from_pandas(df) >>> >>> preprocessor = Chain( ... StandardScaler(columns=["X0", "X1"]), ... Concatenator(columns=["X0", "X1"], output_column_name="X"), ... LabelEncoder(label_column="Y") ... ) >>> preprocessor.fit_transform(ds).to_pandas() Y X 0 1 [-1.224744871391589, -1.224744871391589] 1 0 [0.0, 0.0] 2 1 [1.224744871391589, 1.224744871391589]
- Parameters:
preprocessors – The preprocessors to sequentially compose.
Methods
Load the original preprocessor serialized via
self.serialize()
.Fit this Preprocessor to the Dataset.
Batch format hint for upstream producers to try yielding best block format.
Return this preprocessor serialized as a string.
Transform the given dataset.
Transform a single batch of data.