ray.rllib.core.learner.learner.Learner.update_from_episodes#

Learner.update_from_episodes(episodes: ~typing.List[SingleAgentEpisode | MultiAgentEpisode], *, reduce_fn: ~typing.Callable[[~typing.List[~typing.Dict[str, ~typing.Any]]], dict | NestedDict] = <function _reduce_mean_results>, minibatch_size: int | None = None, num_iters: int = 1, min_total_mini_batches: int = 0) Dict[str, Any] | List[Dict[str, Any]][source]#

Do num_iters minibatch updates given a list of episodes.

You can use this method to take more than one backward pass on the batch. The same minibatch_size and num_iters will be used for all module ids in MultiAgentRLModule.

Parameters:
  • episodes – An list of episode objects to update from.

  • reduce_fn – reduce_fn: A function to reduce the results from a list of minibatch updates. This can be any arbitrary function that takes a list of dictionaries and returns a single dictionary. For example you can either take an average (default) or concatenate the results (for example for metrics) or be more selective about you want to report back to the algorithm’s training_step. If None is passed, the results will not get reduced.

  • minibatch_size – The size of the minibatch to use for each update.

  • num_iters – The number of complete passes over all the sub-batches in the input multi-agent batch.

  • min_total_mini_batches – The minimum number of mini-batches to loop through (across all num_sgd_iter SGD iterations). It’s required to set this for multi-agent + multi-GPU situations in which the MultiAgentEpisodes themselves are roughly sharded equally, however, they might contain SingleAgentEpisodes with very lopsided length distributions. Thus, without this limit it can happen that one Learner goes through a different number of mini-batches than other Learners, causing deadlocks.

Returns:

A dictionary of results, in numpy format or a list of such dictionaries in case reduce_fn is None and we have more than one minibatch pass.