ray.data.preprocessors.PowerTransformer#

class ray.data.preprocessors.PowerTransformer(columns: List[str], power: float, method: str = 'yeo-johnson')[source]#

Bases: Preprocessor

Apply a power transform to make your data more normally distributed.

Some models expect data to be normally distributed. By making your data more Gaussian-like, you might be able to improve your model’s performance.

This preprocessor supports the following transformations:

Box-Cox requires all data to be positive.

Warning

You need to manually specify the transform’s power parameter. If you choose a bad value, the transformation might not work well.

Parameters:
  • columns – The columns to separately transform.

  • power – A parameter that determines how your data is transformed. Practioners typically set power between \(-2.5\) and \(2.5\), although you may need to try different values to find one that works well.

  • method – A string representing which transformation to apply. Supports "yeo-johnson" and "box-cox". If you choose "box-cox", your data needs to be positive. Defaults to "yeo-johnson".

PublicAPI (alpha): This API is in alpha and may change before becoming stable.

Methods

deserialize

Load the original preprocessor serialized via self.serialize().

fit

Fit this Preprocessor to the Dataset.

fit_transform

Fit this Preprocessor to the Dataset and then transform the Dataset.

preferred_batch_format

Batch format hint for upstream producers to try yielding best block format.

serialize

Return this preprocessor serialized as a string.

transform

Transform the given dataset.

transform_batch

Transform a single batch of data.

transform_stats

Return Dataset stats for the most recent transform call, if any.