ray.data.preprocessors.Chain#

class ray.data.preprocessors.Chain(*preprocessors: ray.data.preprocessor.Preprocessor)[source]#

Bases: ray.data.preprocessor.Preprocessor

Combine multiple preprocessors into a single Preprocessor.

When you call fit, each preprocessor is fit on the dataset produced by the preceeding preprocessor’s fit_transform.

Example

>>> import pandas as pd
>>> import ray
>>> from ray.data.preprocessors import *
>>>
>>> df = pd.DataFrame({
...     "X0": [0, 1, 2],
...     "X1": [3, 4, 5],
...     "Y": ["orange", "blue", "orange"],
... })
>>> ds = ray.data.from_pandas(df)  
>>>
>>> preprocessor = Chain(
...     StandardScaler(columns=["X0", "X1"]),
...     Concatenator(include=["X0", "X1"], output_column_name="X"),
...     LabelEncoder(label_column="Y")
... )
>>> preprocessor.fit_transform(ds).to_pandas()  
   Y                                         X
0  1  [-1.224744871391589, -1.224744871391589]
1  0                                [0.0, 0.0]
2  1    [1.224744871391589, 1.224744871391589]
Parameters

preprocessors – The preprocessors to sequentially compose.

PublicAPI (alpha): This API is in alpha and may change before becoming stable.

fit_transform(ds: ray.data.dataset.Dataset) ray.data.dataset.Dataset[source]#

Fit this Preprocessor to the Dataset and then transform the Dataset.

Calling it more than once will overwrite all previously fitted state: preprocessor.fit_transform(A).fit_transform(B) is equivalent to preprocessor.fit_transform(B).

Parameters

dataset – Input Dataset.

Returns

The transformed Dataset.

Return type

ray.data.Dataset