class ray.data.preprocessor.Preprocessor[source]#

Bases: ABC

Implements an ML preprocessing operation.

Preprocessors are stateful objects that can be fitted against a Dataset and used to transform both local data batches and distributed data. For example, a Normalization preprocessor may calculate the mean and stdev of a field during fitting, and uses these attributes to implement its normalization transform.

Preprocessors can also be stateless and transform data without needed to be fitted. For example, a preprocessor may simply remove a column, which does not require any state to be fitted.

If you are implementing your own Preprocessor sub-class, you should override the following:

  • _fit if your preprocessor is stateful. Otherwise, set _is_fittable=False.

  • _transform_pandas and/or _transform_numpy for best performance, implement both. Otherwise, the data will be converted to the match the implemented method.

PublicAPI (beta): This API is in beta and may change before becoming stable.




Load the original preprocessor serialized via self.serialize().


Fit this Preprocessor to the Dataset.


Fit this Preprocessor to the Dataset and then transform the Dataset.


Batch format hint for upstream producers to try yielding best block format.


Return this preprocessor serialized as a string.


Transform the given dataset.


Transform a single batch of data.