ray.rllib.algorithms.algorithm.Algorithm.step#

Algorithm.step() → Dict[source]#

Implements the main Algorithm.train() logic.

Takes n attempts to perform a single training step. Thereby catches RayErrors resulting from worker failures. After n attempts, fails gracefully.

Override this method in your Algorithm sub-classes if you would like to handle worker failures yourself. Otherwise, override only training_step() to implement the core algorithm logic.

Returns:: The results dict with stats/infos on sampling, training, and - if required - evaluation.