ray.data.Dataset.sum#
- Dataset.sum(on: str | List[str] | None = None, ignore_nulls: bool = True) Any | Dict[str, Any] [source]#
Compute the sum of one or more columns.
Note
This operation will trigger execution of the lazy transformations performed on this dataset.
Note
This operation requires all inputs to be materialized in object store for it to execute.
Examples
>>> import ray >>> ray.data.range(100).sum("id") 4950 >>> ray.data.from_items([ ... {"A": i, "B": i**2} ... for i in range(100) ... ]).sum(["A", "B"]) {'sum(A)': 4950, 'sum(B)': 328350}
- Parameters:
on – a column name or a list of column names to aggregate.
ignore_nulls – Whether to ignore null values. If
True
, null values are ignored when computing the sum. IfFalse
, when a null value is encountered, the output isNone
. Ray Data considersnp.nan
,None
, andpd.NaT
to be null values. Default isTrue
.
- Returns:
The sum result.
For different values of
on
, the return varies:on=None
: a dict containing the column-wise sum of all columns,on="col"
: a scalar representing the sum of all items in column"col"
,on=["col_1", ..., "col_n"]
: an n-columndict
containing the column-wise sum of the provided columns.
If the dataset is empty, all values are null. If
ignore_nulls
isFalse
and any value is null, then the output isNone
.