ray.data.grouped_dataset.GroupedDataset.mean
ray.data.grouped_dataset.GroupedDataset.mean#
- GroupedDataset.mean(on: Union[None, str, Callable[[ray.data.block.T], Any], List[Union[None, str, Callable[[ray.data.block.T], Any]]]] = None, ignore_nulls: bool = True) ray.data.dataset.Dataset[ray.data.block.U] [source]#
Compute grouped mean aggregation.
Examples
>>> import ray >>> ray.data.range(100).groupby(lambda x: x % 3).mean() >>> ray.data.from_items([ ... (i % 3, i, i**2) ... for i in range(100)]) \ ... .groupby(lambda x: x[0] % 3) \ ... .mean(lambda x: x[2]) >>> ray.data.range_table(100).groupby("value").mean() >>> ray.data.from_items([ ... {"A": i % 3, "B": i, "C": i**2} ... for i in range(100)]) \ ... .groupby("A") \ ... .mean(["B", "C"])
- Parameters
on –
The data subset on which to compute the mean.
For a simple dataset: it can be a callable or a list thereof, and the default is to take a mean of all rows.
For an Arrow dataset: it can be a column name or a list thereof, and the default is to do a column-wise mean of all columns.
ignore_nulls – Whether to ignore null values. If
True
, null values will be ignored when computing the mean; ifFalse
, if a null value is encountered, the output will be null. We consider np.nan, None, and pd.NaT to be null values. Default isTrue
.
- Returns
The mean result.
For a simple dataset, the output is:
on=None
: a simple dataset of(k, mean)
tuples wherek
is the groupby key andmean
is mean of all rows in that group.on=[callable_1, ..., callable_n]
: a simple dataset of(k, mean_1, ..., mean_n)
tuples wherek
is the groupby key andmean_i
is mean of the outputs of the ith callable called on each row in that group.
For an Arrow dataset, the output is:
on=None
: an Arrow dataset containing a groupby key column,"k"
, and a column-wise mean column for each original column in the dataset.on=["col_1", ..., "col_n"]
: an Arrow dataset ofn + 1
columns where the first column is the groupby key and the second throughn + 1
columns are the results of the aggregations.
If groupby key is
None
then the key part of return is omitted.