ray.data.preprocessor.Preprocessor#

class ray.data.preprocessor.Preprocessor[source]#

Bases: ABC

Implements an ML preprocessing operation.

Preprocessors are stateful objects that can be fitted against a Dataset and used to transform both local data batches and distributed data. For example, a Normalization preprocessor may calculate the mean and stdev of a field during fitting, and uses these attributes to implement its normalization transform.

Preprocessors can also be stateless and transform data without needed to be fitted. For example, a preprocessor may simply remove a column, which does not require any state to be fitted.

If you are implementing your own Preprocessor sub-class, you should override the following:

_fit if your preprocessor is stateful. Otherwise, set _is_fittable=False.
_transform_pandas and/or _transform_numpy for best performance, implement both. Otherwise, the data will be converted to the match the implemented method.

PublicAPI (beta): This API is in beta and may change before becoming stable.

Methods

`__init__`
`deserialize`	Load the original preprocessor serialized via `self.serialize()`.
`fit`	Fit this Preprocessor to the Dataset.
`fit_transform`	Fit this Preprocessor to the Dataset and then transform the Dataset.
`preferred_batch_format`	Batch format hint for upstream producers to try yielding best block format.
`serialize`	Return this preprocessor serialized as a string.
`transform`	Transform the given dataset.
`transform_batch`	Transform a single batch of data.