ray.data.preprocessors.OrdinalEncoder#
- class ray.data.preprocessors.OrdinalEncoder(columns: List[str], *, encode_lists: bool = True, output_columns: List[str] | None = None)[source]#
- Bases: - Preprocessor- Encode values within columns as ordered integer values. - OrdinalEncoderencodes categorical features as integers that range from \(0\) to \(n - 1\), where \(n\) is the number of categories.- If you transform a value that isn’t in the fitted datset, then the value is encoded as - float("nan").- Columns must contain either hashable values or lists of hashable values. Also, you can’t have both scalars and lists in the same column. - Examples - Use - OrdinalEncoderto encode categorical features as integers.- >>> import pandas as pd >>> import ray >>> from ray.data.preprocessors import OrdinalEncoder >>> df = pd.DataFrame({ ... "sex": ["male", "female", "male", "female"], ... "level": ["L4", "L5", "L3", "L4"], ... }) >>> ds = ray.data.from_pandas(df) >>> encoder = OrdinalEncoder(columns=["sex", "level"]) >>> encoder.fit_transform(ds).to_pandas() sex level 0 1 1 1 0 2 2 1 0 3 0 1 - OrdinalEncodercan also be used in append mode by providing the name of the output_columns that should hold the encoded values.- >>> encoder = OrdinalEncoder(columns=["sex", "level"], output_columns=["sex_encoded", "level_encoded"]) >>> encoder.fit_transform(ds).to_pandas() sex level sex_encoded level_encoded 0 male L4 1 1 1 female L5 0 2 2 male L3 1 0 3 female L4 0 1 - If you transform a value not present in the original dataset, then the value is encoded as - float("nan").- >>> df = pd.DataFrame({"sex": ["female"], "level": ["L6"]}) >>> ds = ray.data.from_pandas(df) >>> encoder.transform(ds).to_pandas() sex level 0 0 NaN - OrdinalEncodercan also encode categories in a list.- >>> df = pd.DataFrame({ ... "name": ["Shaolin Soccer", "Moana", "The Smartest Guys in the Room"], ... "genre": [ ... ["comedy", "action", "sports"], ... ["animation", "comedy", "action"], ... ["documentary"], ... ], ... }) >>> ds = ray.data.from_pandas(df) >>> encoder = OrdinalEncoder(columns=["genre"]) >>> encoder.fit_transform(ds).to_pandas() name genre 0 Shaolin Soccer [2, 0, 4] 1 Moana [1, 2, 0] 2 The Smartest Guys in the Room [3] - Parameters:
- columns – The columns to separately encode. 
- encode_lists – If - True, encode list elements. If- False, encode whole lists (i.e., replace each list with an integer).- Trueby default.
- output_columns – The names of the transformed columns. If None, the transformed columns will be the same as the input columns. If not None, the length of - output_columnsmust match the length of- columns, othwerwise an error will be raised.
 
 - See also - OneHotEncoder
- Another preprocessor that encodes categorical data. 
 - PublicAPI (alpha): This API is in alpha and may change before becoming stable. - Methods - Load the original preprocessor serialized via - self.serialize().- Fit this Preprocessor to the Dataset. - Fit this Preprocessor to the Dataset and then transform the Dataset. - Batch format hint for upstream producers to try yielding best block format. - Return this preprocessor serialized as a string. - Transform the given dataset. - Transform a single batch of data.