ray.data.aggregate.ValueCounter#
- class ray.data.aggregate.ValueCounter(on: str, alias_name: str | None = None)[source]#
Bases:
AggregateFnV2
Counts the number of times each value appears in a column.
This aggregation computes value counts for a specified column, similar to pandas’
value_counts()
method. It returns a dictionary with two lists: “values” containing the unique values found in the column, and “counts” containing the corresponding count for each value.Example
import ray from ray.data.aggregate import ValueCounter # Create a dataset with repeated values ds = ray.data.from_items([ {"category": "A"}, {"category": "B"}, {"category": "A"}, {"category": "C"}, {"category": "A"}, {"category": "B"} ]) # Count occurrences of each category result = ds.aggregate(ValueCounter(on="category")) # result: {'value_counter(category)': {'values': ['A', 'B', 'C'], 'counts': [3, 2, 1]}} # Using with groupby ds = ray.data.from_items([ {"group": "X", "category": "A"}, {"group": "X", "category": "B"}, {"group": "Y", "category": "A"}, {"group": "Y", "category": "A"} ]) result = ds.groupby("group").aggregate(ValueCounter(on="category")).take_all() # result: [{'group': 'X', 'value_counter(category)': {'values': ['A', 'B'], 'counts': [1, 1]}}, # {'group': 'Y', 'value_counter(category)': {'values': ['A'], 'counts': [2]}}]
- Parameters:
on – The name of the column to count values in. Must be provided.
alias_name – Optional name for the resulting column. If not provided, defaults to “value_counter({column_name})”.
Methods
Transforms the final accumulated state into the desired output.