ray.data.preprocessors.Chain#
- class ray.data.preprocessors.Chain(*preprocessors: SerializablePreprocessorBase)[source]#
Bases:
SerializablePreprocessorBaseCombine multiple preprocessors into a single
Preprocessor.When you call
fit, each preprocessor is fit on the dataset produced by the preceeding preprocessor’sfit_transform.Example
>>> import pandas as pd >>> import ray >>> from ray.data.preprocessors import * >>> >>> df = pd.DataFrame({ ... "X0": [0, 1, 2], ... "X1": [3, 4, 5], ... "Y": ["orange", "blue", "orange"], ... }) >>> ds = ray.data.from_pandas(df) >>> >>> preprocessor = Chain( ... StandardScaler(columns=["X0", "X1"]), ... Concatenator(columns=["X0", "X1"], output_column_name="X"), ... LabelEncoder(label_column="Y") ... ) >>> preprocessor.fit_transform(ds).to_pandas() Y X 0 1 [-1.224744871391589, -1.224744871391589] 1 0 [0.0, 0.0] 2 1 [1.224744871391589, 1.224744871391589]
- Parameters:
*preprocessors – The preprocessors to sequentially compose.
Methods
Deserialize a preprocessor from serialized data.
Fit this Preprocessor to the Dataset.
Get the preprocessor class identifier for this preprocessor class.
Get the version number for this preprocessor class.
Batch format hint for upstream producers to try yielding best block format.
Serialize this preprocessor to a string or bytes.
Set the preprocessor class identifier for this preprocessor class.
Set the version number for this preprocessor class.
Transform the given dataset.
Transform a single batch of data.
Attributes