ray.data.grouped_dataset.GroupedDataset.sum
ray.data.grouped_dataset.GroupedDataset.sum#
- GroupedDataset.sum(on: Union[None, str, Callable[[ray.data.block.T], Any], List[Union[None, str, Callable[[ray.data.block.T], Any]]]] = None, ignore_nulls: bool = True) ray.data.dataset.Dataset[ray.data.block.U] [source]#
Compute grouped sum aggregation.
This is a blocking operation.
Examples
>>> import ray >>> ray.data.range(100).groupby(lambda x: x % 3).sum() >>> ray.data.from_items([ ... (i % 3, i, i**2) ... for i in range(100)]) \ ... .groupby(lambda x: x[0] % 3) \ ... .sum(lambda x: x[2]) >>> ray.data.range_table(100).groupby("value").sum() >>> ray.data.from_items([ ... {"A": i % 3, "B": i, "C": i**2} ... for i in range(100)]) \ ... .groupby("A") \ ... .sum(["B", "C"])
- Parameters
on –
The data subset on which to compute the sum.
For a simple dataset: it can be a callable or a list thereof, and the default is to take a sum of all rows.
For an Arrow dataset: it can be a column name or a list thereof, and the default is to do a column-wise sum of all columns.
ignore_nulls – Whether to ignore null values. If
True
, null values will be ignored when computing the sum; ifFalse
, if a null value is encountered, the output will be null. We consider np.nan, None, and pd.NaT to be null values. Default isTrue
.
- Returns
The sum result.
For a simple dataset, the output is:
on=None
: a simple dataset of(k, sum)
tuples wherek
is the groupby key andsum
is sum of all rows in that group.on=[callable_1, ..., callable_n]
: a simple dataset of(k, sum_1, ..., sum_n)
tuples wherek
is the groupby key andsum_i
is sum of the outputs of the ith callable called on each row in that group.
For an Arrow dataset, the output is:
on=None
: an Arrow dataset containing a groupby key column,"k"
, and a column-wise sum column for each original column in the dataset.on=["col_1", ..., "col_n"]
: an Arrow dataset ofn + 1
columns where the first column is the groupby key and the second throughn + 1
columns are the results of the aggregations.
If groupby key is
None
then the key part of return is omitted.