ray.data.aggregate.MissingValuePercentage#

class ray.data.aggregate.MissingValuePercentage(on: str, alias_name: str | None = None)[source]#

Bases: AggregateFnV2

Calculates the percentage of null values in a column.

This aggregation computes the percentage of null (missing) values in a dataset column. It treats both None values and NaN values as null. The result is a percentage value between 0.0 and 100.0, where 0.0 means no missing values and 100.0 means all values are missing.

Example

import ray
from ray.data.aggregate import MissingValuePercentage

# Create a dataset with some missing values
ds = ray.data.from_items([
    {"value": 1}, {"value": None}, {"value": 3},
    {"value": None}, {"value": 5}
])

# Calculate missing value percentage
result = ds.aggregate(MissingValuePercentage(on="value"))
# result: 40.0 (2 out of 5 values are missing)

# Using with groupby
ds = ray.data.from_items([
    {"group": "A", "value": 1}, {"group": "A", "value": None},
    {"group": "B", "value": 3}, {"group": "B", "value": None}
])
result = ds.groupby("group").aggregate(MissingValuePercentage(on="value")).take_all()
# result: [{'group': 'A', 'missing_pct(value)': 50.0},
#          {'group': 'B', 'missing_pct(value)': 50.0}]
Parameters:
  • on – The name of the column to calculate missing value percentage on.

  • alias_name – Optional name for the resulting column. If not provided, defaults to “missing_pct({column_name})”.

PublicAPI (alpha): This API is in alpha and may change before becoming stable.

Methods