ray.train.lightgbm.normalize_pandas_for_lightgbm#
- ray.train.lightgbm.normalize_pandas_for_lightgbm(df: pd.DataFrame) pd.DataFrame[source]#
Map Arrow-backed pandas dtypes to NumPy-nullable equivalents.
LightGBM’s pandas input validation rejects Arrow-backed dtypes like
int64[pyarrow]. Since Ray Data 2.56,Dataset.to_pandas()preserves Arrow-backed dtypes when the source was Arrow, so callers passing the resulting frame tolightgbm.Datasetmust normalize first.This helper is a faster alternative to
df.convert_dtypes(dtype_backend="numpy_nullable"):It maps dtypes mechanically rather than scanning every value.
It only touches
pd.ArrowDtypecolumns. NumPy-backed columns (e.g. fromray.data.from_pandasshards) keep their original buffers.
Only numeric and boolean Arrow dtypes are remapped. Other Arrow dtypes (string, decimal, timestamp) are left as-is; LightGBM doesn’t accept them as features anyway.
- Parameters:
df – The pandas DataFrame to normalize.
- Returns:
A DataFrame with Arrow-backed numeric/boolean columns replaced by NumPy-nullable equivalents. Other columns are returned unchanged.
PublicAPI (alpha): This API is in alpha and may change before becoming stable.