Learner.update(batch: ~ray.rllib.policy.sample_batch.MultiAgentBatch, *, minibatch_size: int | None = None, num_iters: int = 1, reduce_fn: ~typing.Callable[[~typing.List[~typing.Mapping[str, ~typing.Any]]], dict] = <function _reduce_mean_results>) Mapping[str, Any] | List[Mapping[str, Any]][source]#

Do num_iters minibatch updates given the original batch.

Given a batch of episodes you can use this method to take more than one backward pass on the batch. The same minibatch_size and num_iters will be used for all module ids in MultiAgentRLModule.

  • batch – A batch of data.

  • minibatch_size – The size of the minibatch to use for each update.

  • num_iters – The number of complete passes over all the sub-batches in the input multi-agent batch.

  • reduce_fn – reduce_fn: A function to reduce the results from a list of minibatch updates. This can be any arbitrary function that takes a list of dictionaries and returns a single dictionary. For example you can either take an average (default) or concatenate the results (for example for metrics) or be more selective about you want to report back to the algorithm’s training_step. If None is passed, the results will not get reduced.


A dictionary of results, in numpy format or a list of such dictionaries in case reduce_fn is None and we have more than one minibatch pass.