ray.data.aggregate.Unique#

class ray.data.aggregate.Unique(on: str | None = None, ignore_nulls: bool = True, alias_name: str | None = None)[source]#

Bases: AggregateFnV2

Defines unique aggregation.

Example

import ray
from ray.data.aggregate import Unique

ds = ray.data.range(100)
ds = ds.add_column("group_key", lambda x: x % 3)

# Calculating the unique values per group:
result = ds.groupby("group_key").aggregate(Unique(on="id")).take_all()
# result: [{'group_key': 0, 'unique(id)': ...},
#          {'group_key': 1, 'unique(id)': ...},
#          {'group_key': 2, 'unique(id)': ...}]
Parameters:
  • on – The name of the column from which to collect unique values.

  • ignore_nulls – Whether to ignore null values when collecting unique items. Default is True (nulls are excluded).

  • alias_name – Optional name for the resulting column.

Methods

finalize

Transforms the final accumulated state into the desired output.