ray.data.grouped_data.GroupedData.sum#
- GroupedData.sum(on: str | List[str] = None, ignore_nulls: bool = True) Dataset [source]#
Compute grouped sum aggregation.
Examples
>>> import ray >>> ray.data.from_items([ ... (i % 3, i, i**2) ... for i in range(100)]) \ ... .groupby(lambda x: x[0] % 3) \ ... .sum(lambda x: x[2]) >>> ray.data.range(100).groupby("id").sum() >>> ray.data.from_items([ ... {"A": i % 3, "B": i, "C": i**2} ... for i in range(100)]) \ ... .groupby("A") \ ... .sum(["B", "C"])
- Parameters:
on – a column name or a list of column names to aggregate.
ignore_nulls – Whether to ignore null values. If
True
, null values will be ignored when computing the sum; ifFalse
, if a null value is encountered, the output will be null. We consider np.nan, None, and pd.NaT to be null values. Default isTrue
.
- Returns:
The sum result.
For different values of
on
, the return varies:on=None
: a dataset containing a groupby key column,"k"
, and a column-wise sum column for each original column in the dataset.on=["col_1", ..., "col_n"]
: a dataset ofn + 1
columns where the first column is the groupby key and the second throughn + 1
columns are the results of the aggregations.
If groupby key is
None
then the key part of return is omitted.