ray.data.preprocessors.Chain#

class ray.data.preprocessors.Chain(*preprocessors: Preprocessor)[source]#

Bases: Preprocessor

Combine multiple preprocessors into a single Preprocessor.

When you call fit, each preprocessor is fit on the dataset produced by the preceeding preprocessor’s fit_transform.

Example

>>> import pandas as pd
>>> import ray
>>> from ray.data.preprocessors import *
>>>
>>> df = pd.DataFrame({
...     "X0": [0, 1, 2],
...     "X1": [3, 4, 5],
...     "Y": ["orange", "blue", "orange"],
... })
>>> ds = ray.data.from_pandas(df)  
>>>
>>> preprocessor = Chain(
...     StandardScaler(columns=["X0", "X1"]),
...     Concatenator(columns=["X0", "X1"], output_column_name="X"),
...     LabelEncoder(label_column="Y")
... )
>>> preprocessor.fit_transform(ds).to_pandas()  
   Y                                         X
0  1  [-1.224744871391589, -1.224744871391589]
1  0                                [0.0, 0.0]
2  1    [1.224744871391589, 1.224744871391589]
Parameters:

preprocessors – The preprocessors to sequentially compose.

Methods

deserialize

Load the original preprocessor serialized via self.serialize().

fit

Fit this Preprocessor to the Dataset.

preferred_batch_format

Batch format hint for upstream producers to try yielding best block format.

serialize

Return this preprocessor serialized as a string.

transform

Transform the given dataset.

transform_batch

Transform a single batch of data.