ray.rllib.algorithms.algorithm.Algorithm.training_step#
- Algorithm.training_step() None [source]#
Default single iteration logic of an algorithm.
Collect on-policy samples (SampleBatches) in parallel using the Algorithm’s EnvRunners (@ray.remote).
Concatenate collected SampleBatches into one train batch.
Note that we may have more than one policy in the multi-agent case: Call the different policies’
learn_on_batch
(simple optimizer) ORload_batch_into_buffer
+learn_on_loaded_batch
(multi-GPU optimizer) methods to calculate loss and update the model(s).Return all collected metrics for the iteration.
- Returns:
For the new API stack, returns None. Results are compiled and extracted automatically through a single
self.metrics.reduce()
call at the very end of an iteration (which might contain more than one call totraining_step()
). This way, we make sure that we account for all results generated by each individualtraining_step()
call. For the old API stack, returns the results dict from executing the training step.