ray.data.preprocessors.LabelEncoder#
- class ray.data.preprocessors.LabelEncoder(label_column: str)[source]#
Bases:
Preprocessor
Encode labels as integer targets.
LabelEncoder
encodes labels as integer targets that range from \(0\) to \(n - 1\), where \(n\) is the number of unique labels.If you transform a label that isn’t in the fitted datset, then the label is encoded as
float("nan")
.Examples
>>> import pandas as pd >>> import ray >>> df = pd.DataFrame({ ... "sepal_width": [5.1, 7, 4.9, 6.2], ... "sepal_height": [3.5, 3.2, 3, 3.4], ... "species": ["setosa", "versicolor", "setosa", "virginica"] ... }) >>> ds = ray.data.from_pandas(df) >>> >>> from ray.data.preprocessors import LabelEncoder >>> encoder = LabelEncoder(label_column="species") >>> encoder.fit_transform(ds).to_pandas() sepal_width sepal_height species 0 5.1 3.5 0 1 7.0 3.2 1 2 4.9 3.0 0 3 6.2 3.4 2
If you transform a label not present in the original dataset, then the new label is encoded as
float("nan")
.>>> df = pd.DataFrame({ ... "sepal_width": [4.2], ... "sepal_height": [2.7], ... "species": ["bracteata"] ... }) >>> ds = ray.data.from_pandas(df) >>> encoder.transform(ds).to_pandas() sepal_width sepal_height species 0 4.2 2.7 NaN
- Parameters:
label_column – A column containing labels that you want to encode.
See also
OrdinalEncoder
If you’re encoding ordered features, use
OrdinalEncoder
instead ofLabelEncoder
.
PublicAPI (alpha): This API is in alpha and may change before becoming stable.
Methods
Load the original preprocessor serialized via
self.serialize()
.Fit this Preprocessor to the Dataset.
Fit this Preprocessor to the Dataset and then transform the Dataset.
Inverse transform the given dataset.
Batch format hint for upstream producers to try yielding best block format.
Return this preprocessor serialized as a string.
Transform the given dataset.
Transform a single batch of data.