Dataset.max(on: str | List[str] | None = None, ignore_nulls: bool = True) Any | Dict[str, Any][source]#

Return the maximum of one or more columns.


This operation will trigger execution of the lazy transformations performed on this dataset.


This operation requires all inputs to be materialized in object store for it to execute.


>>> import ray
>>> ray.data.range(100).max("id")
>>> ray.data.from_items([
...     {"A": i, "B": i**2}
...     for i in range(100)
... ]).max(["A", "B"])
{'max(A)': 99, 'max(B)': 9801}
  • on – a column name or a list of column names to aggregate.

  • ignore_nulls – Whether to ignore null values. If True, null values are ignored when computing the max; if False, when a null value is encountered, the output is None. This method considers np.nan, None, and pd.NaT to be null values. Default is True.


The max result.

For different values of on, the return varies:

  • on=None: an dict containing the column-wise max of all columns,

  • on="col": a scalar representing the max of all items in column "col",

  • on=["col_1", ..., "col_n"]: an n-column dict containing the column-wise max of the provided columns.

If the dataset is empty, all values are null. If ignore_nulls is False and any value is null, then the output is None.