ray.data.aggregate.Count#
- class ray.data.aggregate.Count(on: str | None = None, ignore_nulls: bool = False, alias_name: str | None = None)[source]#
Bases:
AggregateFnV2
Defines count aggregation.
Example
import ray from ray.data.aggregate import Count ds = ray.data.range(100) # Schema: {'id': int64} ds = ds.add_column("group_key", lambda x: x % 3) # Schema: {'id': int64, 'group_key': int64} # Counting all rows: result = ds.aggregate(Count()) # result: {'count()': 100} # Counting all rows per group: result = ds.groupby("group_key").aggregate(Count(on="id")).take_all() # result: [{'group_key': 0, 'count(id)': 34}, # {'group_key': 1, 'count(id)': 33}, # {'group_key': 2, 'count(id)': 33}]
- Parameters:
on – Optional name of the column to count values on. If None, counts rows.
ignore_nulls – Whether to ignore null values when counting. Only applies if
on
is specified. Default isFalse
which meansCount()
on a column will count nulls by default. To match pandas default behavior of not counting nulls, setignore_nulls=True
.alias_name – Optional name for the resulting column.
Methods
Transforms the final accumulated state into the desired output.