GroupedData.sum(on: str | List[str] = None, ignore_nulls: bool = True) Dataset[source]#

Compute grouped sum aggregation.


>>> import ray
>>> ray.data.from_items([ 
...     (i % 3, i, i**2) 
...     for i in range(100)]) \ 
...     .groupby(lambda x: x[0] % 3) \ 
...     .sum(lambda x: x[2]) 
>>> ray.data.range(100).groupby("id").sum() 
>>> ray.data.from_items([ 
...     {"A": i % 3, "B": i, "C": i**2} 
...     for i in range(100)]) \ 
...     .groupby("A") \ 
...     .sum(["B", "C"]) 
  • on – a column name or a list of column names to aggregate.

  • ignore_nulls – Whether to ignore null values. If True, null values will be ignored when computing the sum; if False, if a null value is encountered, the output will be null. We consider np.nan, None, and pd.NaT to be null values. Default is True.


The sum result.

For different values of on, the return varies:

  • on=None: a dataset containing a groupby key column, "k", and a column-wise sum column for each original column in the dataset.

  • on=["col_1", ..., "col_n"]: a dataset of n + 1 columns where the first column is the groupby key and the second through n + 1 columns are the results of the aggregations.

If groupby key is None then the key part of return is omitted.