ray.rllib.core.learner.learner.Learner.after_gradient_based_update#

Learner.after_gradient_based_update(*, timesteps: Dict[str, Any]) None[source]#

Called after gradient-based updates are completed.

Should be overridden to implement custom cleanup-, logging-, or non-gradient- based Learner/RLModule update logic after(!) gradient-based updates have been completed.

Parameters:

timesteps – Timesteps dict, which must have the key NUM_ENV_STEPS_SAMPLED_LIFETIME. # TODO (sven): Make this a more formal structure with its own type.