ray.data.preprocessors.OrdinalEncoder
ray.data.preprocessors.OrdinalEncoder#
- class ray.data.preprocessors.OrdinalEncoder(columns: List[str], *, encode_lists: bool = True)[source]#
Bases:
ray.data.preprocessor.Preprocessor
Encode values within columns as ordered integer values.
OrdinalEncoder
encodes categorical features as integers that range from \(0\) to \(n - 1\), where \(n\) is the number of categories.If you transform a value that isn’t in the fitted datset, then the value is encoded as
float("nan")
.Columns must contain either hashable values or lists of hashable values. Also, you can’t have both scalars and lists in the same column.
Examples
Use
OrdinalEncoder
to encode categorical features as integers.>>> import pandas as pd >>> import ray >>> from ray.data.preprocessors import OrdinalEncoder >>> df = pd.DataFrame({ ... "sex": ["male", "female", "male", "female"], ... "level": ["L4", "L5", "L3", "L4"], ... }) >>> ds = ray.data.from_pandas(df) >>> encoder = OrdinalEncoder(columns=["sex", "level"]) >>> encoder.fit_transform(ds).to_pandas() sex level 0 1 1 1 0 2 2 1 0 3 0 1
If you transform a value not present in the original dataset, then the value is encoded as
float("nan")
.>>> df = pd.DataFrame({"sex": ["female"], "level": ["L6"]}) >>> ds = ray.data.from_pandas(df) >>> encoder.transform(ds).to_pandas() sex level 0 0 NaN
OrdinalEncoder
can also encode categories in a list.>>> df = pd.DataFrame({ ... "name": ["Shaolin Soccer", "Moana", "The Smartest Guys in the Room"], ... "genre": [ ... ["comedy", "action", "sports"], ... ["animation", "comedy", "action"], ... ["documentary"], ... ], ... }) >>> ds = ray.data.from_pandas(df) >>> encoder = OrdinalEncoder(columns=["genre"]) >>> encoder.fit_transform(ds).to_pandas() name genre 0 Shaolin Soccer [2, 0, 4] 1 Moana [1, 2, 0] 2 The Smartest Guys in the Room [3]
- Parameters
columns – The columns to separately encode.
encode_lists – If
True
, encode list elements. IfFalse
, encode whole lists (i.e., replace each list with an integer).True
by default.
See also
OneHotEncoder
Another preprocessor that encodes categorical data.
PublicAPI (alpha): This API is in alpha and may change before becoming stable.