ray.data.Dataset.sum#
- Dataset.sum(on: str | List[str] | None = None, ignore_nulls: bool = True) Any | Dict[str, Any][source]#
- Compute the sum of one or more columns. - Note - This operation will trigger execution of the lazy transformations performed on this dataset. - Note - This operation requires all inputs to be materialized in object store for it to execute. - Examples - >>> import ray >>> ray.data.range(100).sum("id") 4950 >>> ray.data.from_items([ ... {"A": i, "B": i**2} ... for i in range(100) ... ]).sum(["A", "B"]) {'sum(A)': 4950, 'sum(B)': 328350} - Parameters:
- on – a column name or a list of column names to aggregate. 
- ignore_nulls – Whether to ignore null values. If - True, null values are ignored when computing the sum. If- False, when a null value is encountered, the output is- None. Ray Data considers- np.nan,- None, and- pd.NaTto be null values. Default is- True.
 
- Returns:
- The sum result. - For different values of - on, the return varies:- on=None: a dict containing the column-wise sum of all columns,
- on="col": a scalar representing the sum of all items in column- "col",
- on=["col_1", ..., "col_n"]: an n-column- dictcontaining the column-wise sum of the provided columns.
 - If the dataset is empty, all values are null. If - ignore_nullsis- Falseand any value is null, then the output is- None.