ray.data.Dataset.sum
ray.data.Dataset.sum#
- Dataset.sum(on: Optional[Union[str, List[str]]] = None, ignore_nulls: bool = True) Union[Any, Dict[str, Any]] [source]#
Compute the sum of one or more columns.
Note
This operation will trigger execution of the lazy transformations performed on this dataset.
Examples
>>> import ray >>> ray.data.range(100).sum("id") 4950 >>> ray.data.from_items([ ... {"A": i, "B": i**2} ... for i in range(100) ... ]).sum(["A", "B"]) {'sum(A)': 4950, 'sum(B)': 328350}
- Parameters
on – a column name or a list of column names to aggregate.
ignore_nulls – Whether to ignore null values. If
True
, null values are ignored when computing the sum. IfFalse
, when a null value is encountered, the output isNone
. Ray Data considersnp.nan
,None
, andpd.NaT
to be null values. Default isTrue
.
- Returns
The sum result.
For different values of
on
, the return varies:on=None
: a dict containing the column-wise sum of all columns,on="col"
: a scalar representing the sum of all items in column"col"
,on=["col_1", ..., "col_n"]
: an n-columndict
containing the column-wise sum of the provided columns.
If the dataset is empty, all values are null. If
ignore_nulls
isFalse
and any value is null, then the output isNone
.