ray.rllib.core.learner.learner.Learner.update_from_batch#

Learner.update_from_batch(batch: MultiAgentBatch, *, timesteps: Dict[str, Any] | None = None, minibatch_size: int | None = None, num_iters: int = 1, reduce_fn=-1) Dict[source]#

Do num_iters minibatch updates given a train batch.

You can use this method to take more than one backward pass on the batch. The same minibatch_size and num_iters will be used for all module ids in MultiRLModule.

Parameters:
  • batch – A batch of training data to update from.

  • timesteps – Timesteps dict, which must have the key NUM_ENV_STEPS_SAMPLED_LIFETIME. # TODO (sven): Make this a more formal structure with its own type.

  • minibatch_size – The size of the minibatch to use for each update.

  • num_iters – The number of complete passes over all the sub-batches in the input multi-agent batch.

Returns:

A ResultDict object produced by a call to self.metrics.reduce(). The returned dict may be arbitrarily nested and must have Stats objects at all its leafs, allowing components further downstream (i.e. a user of this Learner) to further reduce these results (for example over n parallel Learners).