ray.data.preprocessors.LabelEncoder#
- class ray.data.preprocessors.LabelEncoder(label_column: str, *, output_column: str | None = None)[source]#
- Bases: - Preprocessor- Encode labels as integer targets. - LabelEncoderencodes labels as integer targets that range from \(0\) to \(n - 1\), where \(n\) is the number of unique labels.- If you transform a label that isn’t in the fitted datset, then the label is encoded as - float("nan").- Examples - >>> import pandas as pd >>> import ray >>> df = pd.DataFrame({ ... "sepal_width": [5.1, 7, 4.9, 6.2], ... "sepal_height": [3.5, 3.2, 3, 3.4], ... "species": ["setosa", "versicolor", "setosa", "virginica"] ... }) >>> ds = ray.data.from_pandas(df) >>> >>> from ray.data.preprocessors import LabelEncoder >>> encoder = LabelEncoder(label_column="species") >>> encoder.fit_transform(ds).to_pandas() sepal_width sepal_height species 0 5.1 3.5 0 1 7.0 3.2 1 2 4.9 3.0 0 3 6.2 3.4 2 - You can also provide the name of the output column that should hold the encoded labels if you want to use - LabelEncoderin append mode.- >>> encoder = LabelEncoder(label_column="species", output_column="species_encoded") >>> encoder.fit_transform(ds).to_pandas() sepal_width sepal_height species species_encoded 0 5.1 3.5 setosa 0 1 7.0 3.2 versicolor 1 2 4.9 3.0 setosa 0 3 6.2 3.4 virginica 2 - If you transform a label not present in the original dataset, then the new label is encoded as - float("nan").- >>> df = pd.DataFrame({ ... "sepal_width": [4.2], ... "sepal_height": [2.7], ... "species": ["bracteata"] ... }) >>> ds = ray.data.from_pandas(df) >>> encoder.transform(ds).to_pandas() sepal_width sepal_height species 0 4.2 2.7 NaN - Parameters:
- label_column – A column containing labels that you want to encode. 
- output_column – The name of the column that will contain the encoded labels. If None, the output column will have the same name as the input column. 
 
 - See also - OrdinalEncoder
- If you’re encoding ordered features, use - OrdinalEncoderinstead of- LabelEncoder.
 - PublicAPI (alpha): This API is in alpha and may change before becoming stable. - Methods - Load the original preprocessor serialized via - self.serialize().- Fit this Preprocessor to the Dataset. - Fit this Preprocessor to the Dataset and then transform the Dataset. - Inverse transform the given dataset. - Batch format hint for upstream producers to try yielding best block format. - Return this preprocessor serialized as a string. - Transform the given dataset. - Transform a single batch of data.