class ray.data.preprocessor.Preprocessor[source]#

Bases: abc.ABC

Implements an ML preprocessing operation.

Preprocessors are stateful objects that can be fitted against a Dataset and used to transform both local data batches and distributed datasets. For example, a Normalization preprocessor may calculate the mean and stdev of a field during fitting, and uses these attributes to implement its normalization transform.

Preprocessors can also be stateless and transform data without needed to be fitted. For example, a preprocessor may simply remove a column, which does not require any state to be fitted.

If you are implementing your own Preprocessor sub-class, you should override the following:

  • _fit if your preprocessor is stateful. Otherwise, set _is_fittable=False.

  • _transform_pandas and/or _transform_numpy for best performance, implement both. Otherwise, the data will be converted to the match the implemented method.

PublicAPI (beta): This API is in beta and may change before becoming stable.




Fit this Preprocessor to the Dataset.


Fit this Preprocessor to the Dataset and then transform the Dataset.


Transform the given dataset.


Transform a single batch of data.


Return Dataset stats for the most recent transform call, if any.