ray.data.aggregate.Count#

class ray.data.aggregate.Count(on: str | None = None, ignore_nulls: bool = False, alias_name: str | None = None)[source]#

Bases: AggregateFnV2

Defines count aggregation.

Example

import ray
from ray.data.aggregate import Count

ds = ray.data.range(100)
# Schema: {'id': int64}
ds = ds.add_column("group_key", lambda x: x % 3)
# Schema: {'id': int64, 'group_key': int64}

# Counting all rows:
result = ds.aggregate(Count())
# result: {'count()': 100}


# Counting all rows per group:
result = ds.groupby("group_key").aggregate(Count(on="id")).take_all()
# result: [{'group_key': 0, 'count(id)': 34},
#          {'group_key': 1, 'count(id)': 33},
#          {'group_key': 2, 'count(id)': 33}]

Parameters:

on – Optional name of the column to count values on. If None, counts rows.
ignore_nulls – Whether to ignore null values when counting. Only applies if on is specified. Default is False which means Count() on a column will count nulls by default. To match pandas default behavior of not counting nulls, set ignore_nulls=True.
alias_name – Optional name for the resulting column.

Methods

finalize

Transforms the final accumulated state into the desired output.