ray.rllib.core.learner.learner.Learner.after_gradient_based_update#
- Learner.after_gradient_based_update(*, timesteps: Dict[str, Any]) None [source]#
Called after gradient-based updates are completed.
Should be overridden to implement custom cleanup-, logging-, or non-gradient- based Learner/RLModule update logic after(!) gradient-based updates have been completed.
- Parameters:
timesteps – Timesteps dict, which must have the key
NUM_ENV_STEPS_SAMPLED_LIFETIME
. # TODO (sven): Make this a more formal structure with its own type.