ray.data.Dataset.max#

Dataset.max(on: Union[None, str, Callable[[ray.data.block.T], Any], List[Union[None, str, Callable[[ray.data.block.T], Any]]]] = None, ignore_nulls: bool = True) ray.data.block.U[source]#

Compute maximum over entire dataset.

This is a blocking operation.

Examples

>>> import ray
>>> ray.data.range(100).max()
99
>>> ray.data.from_items([
...     (i, i**2)
...     for i in range(100)]).max(lambda x: x[1])
9801
>>> ray.data.range_table(100).max("value")
99
>>> ray.data.from_items([
...     {"A": i, "B": i**2}
...     for i in range(100)]).max(["A", "B"])
{'max(A)': 99, 'max(B)': 9801}
Parameters
  • on

    The data subset on which to compute the max.

    • For a simple dataset: it can be a callable or a list thereof, and the default is to return a scalar max of all rows.

    • For an Arrow dataset: it can be a column name or a list thereof, and the default is to return an ArrowRow containing the column-wise max of all columns.

  • ignore_nulls – Whether to ignore null values. If True, null values will be ignored when computing the max; if False, if a null value is encountered, the output will be None. We consider np.nan, None, and pd.NaT to be null values. Default is True.

Returns

The max result.

For a simple dataset, the output is:

  • on=None: a scalar representing the max of all rows,

  • on=callable: a scalar representing the max of the outputs of the callable called on each row,

  • on=[callable_1, ..., calalble_n]: a tuple of (max_1, ..., max_n) representing the max of the outputs of the corresponding callables called on each row.

For an Arrow dataset, the output is:

  • on=None: an ArrowRow containing the column-wise max of all columns,

  • on="col": a scalar representing the max of all items in column "col",

  • on=["col_1", ..., "col_n"]: an n-column ArrowRow containing the column-wise max of the provided columns.

If the dataset is empty, all values are null, or any value is null AND ignore_nulls is False, then the output will be None.