ray.rllib.core.learner.learner.Learner.update_from_episodes#
- Learner.update_from_episodes(episodes: List[SingleAgentEpisode | MultiAgentEpisode], *, timesteps: Dict[str, Any] | None = None, minibatch_size: int | None = None, num_iters: int = 1, num_total_mini_batches: int = 0, reduce_fn=-1) Dict [source]#
Do
num_iters
minibatch updates given a list of episodes.You can use this method to take more than one backward pass on the batch. The same
minibatch_size
andnum_iters
will be used for all module ids in MultiRLModule.- Parameters:
episodes – An list of episode objects to update from.
timesteps – Timesteps dict, which must have the key
NUM_ENV_STEPS_SAMPLED_LIFETIME
. # TODO (sven): Make this a more formal structure with its own type.minibatch_size – The size of the minibatch to use for each update.
num_iters – The number of complete passes over all the sub-batches in the input multi-agent batch.
num_total_mini_batches – The total number of mini-batches to loop through (across all
num_sgd_iter
SGD iterations). It’s required to set this for multi-agent + multi-GPU situations in which the MultiAgentEpisodes themselves are roughly sharded equally, however, they might contain SingleAgentEpisodes with very lopsided length distributions. Thus, without this fixed, pre-computed value it can happen that one Learner goes through a different number of mini-batches than other Learners, causing a deadlock.
- Returns:
A
ResultDict
object produced by a call toself.metrics.reduce()
. The returned dict may be arbitrarily nested and must haveStats
objects at all its leafs, allowing components further downstream (i.e. a user of this Learner) to further reduce these results (for example over n parallel Learners).