ray.data.preprocessors.LabelEncoder#

class ray.data.preprocessors.LabelEncoder(label_column: str)[source]#

Bases: Preprocessor

Encode labels as integer targets.

LabelEncoder encodes labels as integer targets that range from \(0\) to \(n - 1\), where \(n\) is the number of unique labels.

If you transform a label that isn’t in the fitted datset, then the label is encoded as float("nan").

Examples

>>> import pandas as pd
>>> import ray
>>> df = pd.DataFrame({
...     "sepal_width": [5.1, 7, 4.9, 6.2],
...     "sepal_height": [3.5, 3.2, 3, 3.4],
...     "species": ["setosa", "versicolor", "setosa", "virginica"]
... })
>>> ds = ray.data.from_pandas(df)  
>>>
>>> from ray.data.preprocessors import LabelEncoder
>>> encoder = LabelEncoder(label_column="species")
>>> encoder.fit_transform(ds).to_pandas()  
   sepal_width  sepal_height  species
0          5.1           3.5        0
1          7.0           3.2        1
2          4.9           3.0        0
3          6.2           3.4        2

If you transform a label not present in the original dataset, then the new label is encoded as float("nan").

>>> df = pd.DataFrame({
...     "sepal_width": [4.2],
...     "sepal_height": [2.7],
...     "species": ["bracteata"]
... })
>>> ds = ray.data.from_pandas(df)  
>>> encoder.transform(ds).to_pandas()  
   sepal_width  sepal_height  species
0          4.2           2.7      NaN

Parameters:: label_column – A column containing labels that you want to encode.

See also

OrdinalEncoder: If you’re encoding ordered features, use OrdinalEncoder instead of LabelEncoder.

PublicAPI (alpha): This API is in alpha and may change before becoming stable.

Methods

`deserialize`	Load the original preprocessor serialized via `self.serialize()`.
`fit`	Fit this Preprocessor to the Dataset.
`fit_transform`	Fit this Preprocessor to the Dataset and then transform the Dataset.
`inverse_transform`	Inverse transform the given dataset.
`preferred_batch_format`	Batch format hint for upstream producers to try yielding best block format.
`serialize`	Return this preprocessor serialized as a string.
`transform`	Transform the given dataset.
`transform_batch`	Transform a single batch of data.