ray.data.preprocessors.LabelEncoder#

class ray.data.preprocessors.LabelEncoder(label_column: str)[source]#

Bases: Preprocessor

Encode labels as integer targets.

LabelEncoder encodes labels as integer targets that range from \(0\) to \(n - 1\), where \(n\) is the number of unique labels.

If you transform a label that isn’t in the fitted datset, then the label is encoded as float("nan").

Examples

>>> import pandas as pd
>>> import ray
>>> df = pd.DataFrame({
...     "sepal_width": [5.1, 7, 4.9, 6.2],
...     "sepal_height": [3.5, 3.2, 3, 3.4],
...     "species": ["setosa", "versicolor", "setosa", "virginica"]
... })
>>> ds = ray.data.from_pandas(df)  
>>>
>>> from ray.data.preprocessors import LabelEncoder
>>> encoder = LabelEncoder(label_column="species")
>>> encoder.fit_transform(ds).to_pandas()  
   sepal_width  sepal_height  species
0          5.1           3.5        0
1          7.0           3.2        1
2          4.9           3.0        0
3          6.2           3.4        2

If you transform a label not present in the original dataset, then the new label is encoded as float("nan").

>>> df = pd.DataFrame({
...     "sepal_width": [4.2],
...     "sepal_height": [2.7],
...     "species": ["bracteata"]
... })
>>> ds = ray.data.from_pandas(df)  
>>> encoder.transform(ds).to_pandas()  
   sepal_width  sepal_height  species
0          4.2           2.7      NaN
Parameters:

label_column – A column containing labels that you want to encode.

See also

OrdinalEncoder

If you’re encoding ordered features, use OrdinalEncoder instead of LabelEncoder.

PublicAPI (alpha): This API is in alpha and may change before becoming stable.

Methods

deserialize

Load the original preprocessor serialized via self.serialize().

fit

Fit this Preprocessor to the Dataset.

fit_transform

Fit this Preprocessor to the Dataset and then transform the Dataset.

inverse_transform

Inverse transform the given dataset.

preferred_batch_format

Batch format hint for upstream producers to try yielding best block format.

serialize

Return this preprocessor serialized as a string.

transform

Transform the given dataset.

transform_batch

Transform a single batch of data.