ray.data.preprocessors.LabelEncoder
ray.data.preprocessors.LabelEncoder#
- class ray.data.preprocessors.LabelEncoder(label_column: str)[source]#
Bases:
ray.data.preprocessor.Preprocessor
Encode labels as integer targets.
LabelEncoder
encodes labels as integer targets that range from \(0\) to \(n - 1\), where \(n\) is the number of unique labels.If you transform a label that isn’t in the fitted datset, then the label is encoded as
float("nan")
.Examples
>>> import pandas as pd >>> import ray >>> df = pd.DataFrame({ ... "sepal_width": [5.1, 7, 4.9, 6.2], ... "sepal_height": [3.5, 3.2, 3, 3.4], ... "species": ["setosa", "versicolor", "setosa", "virginica"] ... }) >>> ds = ray.data.from_pandas(df) >>> >>> from ray.data.preprocessors import LabelEncoder >>> encoder = LabelEncoder(label_column="species") >>> encoder.fit_transform(ds).to_pandas() sepal_width sepal_height species 0 5.1 3.5 0 1 7.0 3.2 1 2 4.9 3.0 0 3 6.2 3.4 2
If you transform a label not present in the original dataset, then the new label is encoded as
float("nan")
.>>> df = pd.DataFrame({ ... "sepal_width": [4.2], ... "sepal_height": [2.7], ... "species": ["bracteata"] ... }) >>> ds = ray.data.from_pandas(df) >>> encoder.transform(ds).to_pandas() sepal_width sepal_height species 0 4.2 2.7 NaN
- Parameters
label_column – A column containing labels that you want to encode.
See also
OrdinalEncoder
If you’re encoding ordered features, use
OrdinalEncoder
instead ofLabelEncoder
.
PublicAPI (alpha): This API is in alpha and may change before becoming stable.