ray.data.preprocessor.Preprocessor#
- class ray.data.preprocessor.Preprocessor[source]#
Bases:
ABC
Implements an ML preprocessing operation.
Preprocessors are stateful objects that can be fitted against a Dataset and used to transform both local data batches and distributed data. For example, a Normalization preprocessor may calculate the mean and stdev of a field during fitting, and uses these attributes to implement its normalization transform.
Preprocessors can also be stateless and transform data without needed to be fitted. For example, a preprocessor may simply remove a column, which does not require any state to be fitted.
If you are implementing your own Preprocessor sub-class, you should override the following:
_fit
if your preprocessor is stateful. Otherwise, set_is_fittable=False
._transform_pandas
and/or_transform_numpy
for best performance, implement both. Otherwise, the data will be converted to the match the implemented method.
PublicAPI (beta): This API is in beta and may change before becoming stable.
Methods
Load the original preprocessor serialized via
self.serialize()
.Fit this Preprocessor to the Dataset.
Fit this Preprocessor to the Dataset and then transform the Dataset.
Batch format hint for upstream producers to try yielding best block format.
Return this preprocessor serialized as a string.
Transform the given dataset.
Transform a single batch of data.