GroupedData.max(on: str | List[str] = None, ignore_nulls: bool = True) Dataset[source]#

Compute grouped max aggregation.


>>> import ray
>>> ray.data.le(100).groupby("value").max() 
>>> ray.data.from_items([ 
...     {"A": i % 3, "B": i, "C": i**2} 
...     for i in range(100)]) \ 
...     .groupby("A") \ 
...     .max(["B", "C"]) 
  • on – a column name or a list of column names to aggregate.

  • ignore_nulls – Whether to ignore null values. If True, null values will be ignored when computing the max; if False, if a null value is encountered, the output will be null. We consider np.nan, None, and pd.NaT to be null values. Default is True.


The max result.

For different values of on, the return varies:

  • on=None: a dataset containing a groupby key column, "k", and a column-wise max column for each original column in the dataset.

  • on=["col_1", ..., "col_n"]: a dataset of n + 1 columns where the first column is the groupby key and the second through n + 1 columns are the results of the aggregations.

If groupby key is None then the key part of return is omitted.