ray.data.grouped_dataset.GroupedDataset.mean#

GroupedDataset.mean(on: Union[None, str, Callable[[ray.data.block.T], Any], List[Union[None, str, Callable[[ray.data.block.T], Any]]]] = None, ignore_nulls: bool = True) ray.data.dataset.Dataset[ray.data.block.U][source]#

Compute grouped mean aggregation.

Examples

```>>> import ray
>>> ray.data.range(100).groupby(lambda x: x % 3).mean()
>>> ray.data.from_items([
...     (i % 3, i, i**2)
...     for i in range(100)]) \
...     .groupby(lambda x: x[0] % 3) \
...     .mean(lambda x: x[2])
>>> ray.data.range_table(100).groupby("value").mean()
>>> ray.data.from_items([
...     {"A": i % 3, "B": i, "C": i**2}
...     for i in range(100)]) \
...     .groupby("A") \
...     .mean(["B", "C"])
```
Parameters
• on

The data subset on which to compute the mean.

• For a simple dataset: it can be a callable or a list thereof, and the default is to take a mean of all rows.

• For an Arrow dataset: it can be a column name or a list thereof, and the default is to do a column-wise mean of all columns.

• ignore_nulls – Whether to ignore null values. If `True`, null values will be ignored when computing the mean; if `False`, if a null value is encountered, the output will be null. We consider np.nan, None, and pd.NaT to be null values. Default is `True`.

Returns

The mean result.

For a simple dataset, the output is:

• `on=None`: a simple dataset of `(k, mean)` tuples where `k` is the groupby key and `mean` is mean of all rows in that group.

• `on=[callable_1, ..., callable_n]`: a simple dataset of `(k, mean_1, ..., mean_n)` tuples where `k` is the groupby key and `mean_i` is mean of the outputs of the ith callable called on each row in that group.

For an Arrow dataset, the output is:

• `on=None`: an Arrow dataset containing a groupby key column, `"k"`, and a column-wise mean column for each original column in the dataset.

• `on=["col_1", ..., "col_n"]`: an Arrow dataset of `n + 1` columns where the first column is the groupby key and the second through `n + 1` columns are the results of the aggregations.

If groupby key is `None` then the key part of return is omitted.