ray.rllib.algorithms.algorithm.Algorithm.training_step#
- Algorithm.training_step() Dict [source]#
Default single iteration logic of an algorithm.
Collect on-policy samples (SampleBatches) in parallel using the Algorithm’s EnvRunners (@ray.remote).
Concatenate collected SampleBatches into one train batch.
Note that we may have more than one policy in the multi-agent case: Call the different policies’
learn_on_batch
(simple optimizer) ORload_batch_into_buffer
+learn_on_loaded_batch
(multi-GPU optimizer) methods to calculate loss and update the model(s).Return all collected metrics for the iteration.
- Returns:
The results dict from executing the training iteration.