ray.train.lightgbm.normalize_pandas_for_lightgbm#

ray.train.lightgbm.normalize_pandas_for_lightgbm(df: pd.DataFrame) pd.DataFrame[source]#

Map Arrow-backed pandas dtypes to NumPy-nullable equivalents.

LightGBM’s pandas input validation rejects Arrow-backed dtypes like int64[pyarrow]. Since Ray Data 2.56, Dataset.to_pandas() preserves Arrow-backed dtypes when the source was Arrow, so callers passing the resulting frame to lightgbm.Dataset must normalize first.

This helper is a faster alternative to df.convert_dtypes(dtype_backend="numpy_nullable"):

  • It maps dtypes mechanically rather than scanning every value.

  • It only touches pd.ArrowDtype columns. NumPy-backed columns (e.g. from ray.data.from_pandas shards) keep their original buffers.

Only numeric and boolean Arrow dtypes are remapped. Other Arrow dtypes (string, decimal, timestamp) are left as-is; LightGBM doesn’t accept them as features anyway.

Parameters:

df – The pandas DataFrame to normalize.

Returns:

A DataFrame with Arrow-backed numeric/boolean columns replaced by NumPy-nullable equivalents. Other columns are returned unchanged.

PublicAPI (alpha): This API is in alpha and may change before becoming stable.