Learner.update_from_episodes(episodes: List[SingleAgentEpisode | MultiAgentEpisode], *, minibatch_size: int | None = None, num_iters: int = 1, min_total_mini_batches: int = 0, reduce_fn=-1) dict | NestedDict[source]#

Do num_iters minibatch updates given a list of episodes.

You can use this method to take more than one backward pass on the batch. The same minibatch_size and num_iters will be used for all module ids in MultiAgentRLModule.

  • episodes – An list of episode objects to update from.

  • minibatch_size – The size of the minibatch to use for each update.

  • num_iters – The number of complete passes over all the sub-batches in the input multi-agent batch.

  • min_total_mini_batches – The minimum number of mini-batches to loop through (across all num_sgd_iter SGD iterations). It’s required to set this for multi-agent + multi-GPU situations in which the MultiAgentEpisodes themselves are roughly sharded equally, however, they might contain SingleAgentEpisodes with very lopsided length distributions. Thus, without this limit it can happen that one Learner goes through a different number of mini-batches than other Learners, causing deadlocks.


A ResultDict object produced by a call to self.metrics.reduce(). The returned dict may be arbitrarily nested and must have Stats objects at all its leafs, allowing components further downstream (i.e. a user of this Learner) to further reduce these results (for example over n parallel Learners).