ray.data.preprocessors.PowerTransformer#

class ray.data.preprocessors.PowerTransformer(columns: List[str], power: float, method: str = 'yeo-johnson', *, output_columns: List[str] | None = None)[source]#

Bases: SerializablePreprocessorBase

Apply a power transform to make your data more normally distributed.

Some models expect data to be normally distributed. By making your data more Gaussian-like, you might be able to improve your model’s performance.

This preprocessor supports the following transformations:

Box-Cox requires all data to be positive.

Warning

You need to manually specify the transform’s power parameter. If you choose a bad value, the transformation might not work well.

Parameters:
  • columns – The columns to separately transform.

  • power – A parameter that determines how your data is transformed. Practioners typically set power between \(-2.5\) and \(2.5\), although you may need to try different values to find one that works well.

  • method – A string representing which transformation to apply. Supports "yeo-johnson" and "box-cox". If you choose "box-cox", your data needs to be positive. Defaults to "yeo-johnson".

  • output_columns – The names of the transformed columns. If None, the transformed columns will be the same as the input columns. If not None, the length of output_columns must match the length of columns, othwerwise an error will be raised.

PublicAPI (alpha): This API is in alpha and may change before becoming stable.

Methods

deserialize

Deserialize a preprocessor from serialized data.

fit

Fit this Preprocessor to the Dataset.

fit_transform

Fit this Preprocessor to the Dataset and then transform the Dataset.

get_preprocessor_class_id

Get the preprocessor class identifier for this preprocessor class.

get_version

Get the version number for this preprocessor class.

preferred_batch_format

Batch format hint for upstream producers to try yielding best block format.

serialize

Serialize this preprocessor to a string or bytes.

set_preprocessor_class_id

Set the preprocessor class identifier for this preprocessor class.

set_version

Set the version number for this preprocessor class.

transform

Transform the given dataset.

transform_batch

Transform a single batch of data.

Attributes

MAGIC_CLOUDPICKLE

SERIALIZER_FORMAT_VERSION

columns

method

output_columns

power

stat_computation_plan