ray.data.preprocessors.LabelEncoder#
- class ray.data.preprocessors.LabelEncoder(label_column: str, *, output_column: str | None = None)[source]#
Bases:
Preprocessor
Encode labels as integer targets.
LabelEncoder
encodes labels as integer targets that range from \(0\) to \(n - 1\), where \(n\) is the number of unique labels.If you transform a label that isn’t in the fitted datset, then the label is encoded as
float("nan")
.Examples
>>> import pandas as pd >>> import ray >>> df = pd.DataFrame({ ... "sepal_width": [5.1, 7, 4.9, 6.2], ... "sepal_height": [3.5, 3.2, 3, 3.4], ... "species": ["setosa", "versicolor", "setosa", "virginica"] ... }) >>> ds = ray.data.from_pandas(df) >>> >>> from ray.data.preprocessors import LabelEncoder >>> encoder = LabelEncoder(label_column="species") >>> encoder.fit_transform(ds).to_pandas() sepal_width sepal_height species 0 5.1 3.5 0 1 7.0 3.2 1 2 4.9 3.0 0 3 6.2 3.4 2
You can also provide the name of the output column that should hold the encoded labels if you want to use
LabelEncoder
in append mode.>>> encoder = LabelEncoder(label_column="species", output_column="species_encoded") >>> encoder.fit_transform(ds).to_pandas() sepal_width sepal_height species species_encoded 0 5.1 3.5 setosa 0 1 7.0 3.2 versicolor 1 2 4.9 3.0 setosa 0 3 6.2 3.4 virginica 2
If you transform a label not present in the original dataset, then the new label is encoded as
float("nan")
.>>> df = pd.DataFrame({ ... "sepal_width": [4.2], ... "sepal_height": [2.7], ... "species": ["bracteata"] ... }) >>> ds = ray.data.from_pandas(df) >>> encoder.transform(ds).to_pandas() sepal_width sepal_height species 0 4.2 2.7 NaN
- Parameters:
label_column – A column containing labels that you want to encode.
output_column – The name of the column that will contain the encoded labels. If None, the output column will have the same name as the input column.
See also
OrdinalEncoder
If you’re encoding ordered features, use
OrdinalEncoder
instead ofLabelEncoder
.
PublicAPI (alpha): This API is in alpha and may change before becoming stable.
Methods
Load the original preprocessor serialized via
self.serialize()
.Fit this Preprocessor to the Dataset.
Fit this Preprocessor to the Dataset and then transform the Dataset.
Inverse transform the given dataset.
Batch format hint for upstream producers to try yielding best block format.
Return this preprocessor serialized as a string.
Transform the given dataset.
Transform a single batch of data.