ray.rllib.algorithms.algorithm.Algorithm.training_step#

Algorithm.training_step() Dict[source]#

Default single iteration logic of an algorithm.

  • Collect on-policy samples (SampleBatches) in parallel using the Algorithm’s EnvRunners (@ray.remote).

  • Concatenate collected SampleBatches into one train batch.

  • Note that we may have more than one policy in the multi-agent case: Call the different policies’ learn_on_batch (simple optimizer) OR load_batch_into_buffer + learn_on_loaded_batch (multi-GPU optimizer) methods to calculate loss and update the model(s).

  • Return all collected metrics for the iteration.

Returns:

The results dict from executing the training iteration.