ray.data.aggregate.MissingValuePercentage#
- class ray.data.aggregate.MissingValuePercentage(on: str, alias_name: str | None = None)[source]#
Bases:
AggregateFnV2
Calculates the percentage of null values in a column.
This aggregation computes the percentage of null (missing) values in a dataset column. It treats both None values and NaN values as null. The result is a percentage value between 0.0 and 100.0, where 0.0 means no missing values and 100.0 means all values are missing.
Example
import ray from ray.data.aggregate import MissingValuePercentage # Create a dataset with some missing values ds = ray.data.from_items([ {"value": 1}, {"value": None}, {"value": 3}, {"value": None}, {"value": 5} ]) # Calculate missing value percentage result = ds.aggregate(MissingValuePercentage(on="value")) # result: 40.0 (2 out of 5 values are missing) # Using with groupby ds = ray.data.from_items([ {"group": "A", "value": 1}, {"group": "A", "value": None}, {"group": "B", "value": 3}, {"group": "B", "value": None} ]) result = ds.groupby("group").aggregate(MissingValuePercentage(on="value")).take_all() # result: [{'group': 'A', 'missing_pct(value)': 50.0}, # {'group': 'B', 'missing_pct(value)': 50.0}]
- Parameters:
on – The name of the column to calculate missing value percentage on.
alias_name – Optional name for the resulting column. If not provided, defaults to “missing_pct({column_name})”.
PublicAPI (alpha): This API is in alpha and may change before becoming stable.
Methods