ray.data.preprocessors.Chain#

class ray.data.preprocessors.Chain(*preprocessors: SerializablePreprocessorBase)[source]#

Bases: SerializablePreprocessorBase

Combine multiple preprocessors into a single Preprocessor.

When you call fit, each preprocessor is fit on the dataset produced by the preceeding preprocessor’s fit_transform.

Example

>>> import pandas as pd
>>> import ray
>>> from ray.data.preprocessors import *
>>>
>>> df = pd.DataFrame({
...     "X0": [0, 1, 2],
...     "X1": [3, 4, 5],
...     "Y": ["orange", "blue", "orange"],
... })
>>> ds = ray.data.from_pandas(df)  
>>>
>>> preprocessor = Chain(
...     StandardScaler(columns=["X0", "X1"]),
...     Concatenator(columns=["X0", "X1"], output_column_name="X"),
...     LabelEncoder(label_column="Y")
... )
>>> preprocessor.fit_transform(ds).to_pandas()  
   Y                                         X
0  1  [-1.224744871391589, -1.224744871391589]
1  0                                [0.0, 0.0]
2  1    [1.224744871391589, 1.224744871391589]
Parameters:

*preprocessors – The preprocessors to sequentially compose.

Methods

deserialize

Deserialize a preprocessor from serialized data.

fit

Fit this Preprocessor to the Dataset.

get_preprocessor_class_id

Get the preprocessor class identifier for this preprocessor class.

get_version

Get the version number for this preprocessor class.

preferred_batch_format

Batch format hint for upstream producers to try yielding best block format.

serialize

Serialize this preprocessor to a string or bytes.

set_preprocessor_class_id

Set the preprocessor class identifier for this preprocessor class.

set_version

Set the version number for this preprocessor class.

transform

Transform the given dataset.

transform_batch

Transform a single batch of data.

Attributes

MAGIC_CLOUDPICKLE

SERIALIZER_FORMAT_VERSION

preprocessors

stat_computation_plan