ray.data.grouped_dataset.GroupedDataset.min
ray.data.grouped_dataset.GroupedDataset.min#
- GroupedDataset.min(on: Union[None, str, Callable[[ray.data.block.T], Any], List[Union[None, str, Callable[[ray.data.block.T], Any]]]] = None, ignore_nulls: bool = True) ray.data.dataset.Dataset[ray.data.block.U] [source]#
Compute grouped min aggregation.
Examples
>>> import ray >>> ray.data.range(100).groupby(lambda x: x % 3).min() >>> ray.data.from_items([ ... (i % 3, i, i**2) ... for i in range(100)]) \ ... .groupby(lambda x: x[0] % 3) \ ... .min(lambda x: x[2]) >>> ray.data.range_table(100).groupby("value").min() >>> ray.data.from_items([ ... {"A": i % 3, "B": i, "C": i**2} ... for i in range(100)]) \ ... .groupby("A") \ ... .min(["B", "C"])
- Parameters
on –
The data subset on which to compute the min.
For a simple dataset: it can be a callable or a list thereof, and the default is to take a min of all rows.
For an Arrow dataset: it can be a column name or a list thereof, and the default is to do a column-wise min of all columns.
ignore_nulls – Whether to ignore null values. If
True
, null values will be ignored when computing the min; ifFalse
, if a null value is encountered, the output will be null. We consider np.nan, None, and pd.NaT to be null values. Default isTrue
.
- Returns
The min result.
For a simple dataset, the output is:
on=None
: a simple dataset of(k, min)
tuples wherek
is the groupby key and min is min of all rows in that group.on=[callable_1, ..., callable_n]
: a simple dataset of(k, min_1, ..., min_n)
tuples wherek
is the groupby key andmin_i
is min of the outputs of the ith callable called on each row in that group.
For an Arrow dataset, the output is:
on=None
: an Arrow dataset containing a groupby key column,"k"
, and a column-wise min column for each original column in the dataset.on=["col_1", ..., "col_n"]
: an Arrow dataset ofn + 1
columns where the first column is the groupby key and the second throughn + 1
columns are the results of the aggregations.
If groupby key is
None
then the key part of return is omitted.