ray.data.grouped_dataset.GroupedDataset#

class ray.data.grouped_dataset.GroupedDataset(dataset: ray.data.dataset.Dataset[ray.data.block.T], key: Union[None, str, Callable[[ray.data.block.T], Any]])[source]#

Represents a grouped dataset created by calling Dataset.groupby().

The actual groupby is deferred until an aggregation is applied.

PublicAPI: This API is stable across Ray releases.

__init__(dataset: ray.data.dataset.Dataset[ray.data.block.T], key: Union[None, str, Callable[[ray.data.block.T], Any]])[source]#

Construct a dataset grouped by key (internal API).

The constructor is not part of the GroupedDataset API. Use the Dataset.groupby() method to construct one.

Methods

__init__(dataset, key)

Construct a dataset grouped by key (internal API).

aggregate(*aggs)

Implements an accumulator-based aggregation.

count()

Compute count aggregation.

map_groups(fn, *[, compute, batch_format])

Apply the given function to each group of records of this dataset.

max([on, ignore_nulls])

Compute grouped max aggregation.

mean([on, ignore_nulls])

Compute grouped mean aggregation.

min([on, ignore_nulls])

Compute grouped min aggregation.

std([on, ddof, ignore_nulls])

Compute grouped standard deviation aggregation.

sum([on, ignore_nulls])

Compute grouped sum aggregation.