Algorithms#
The Algorithm
class is the highest-level API in RLlib responsible for WHEN and WHAT of RL algorithms.
Things like WHEN should we sample the algorithm, WHEN should we perform a neural network update, and so on.
The HOW will be delegated to components such as RolloutWorker
, etc..
It is the main entry point for RLlib users to interact with RLlib’s algorithms.
It allows you to train and evaluate policies, save an experiment’s progress and restore from
a prior saved experiment when continuing an RL run.
Algorithm
is a sub-class
of Trainable
and thus fully supports distributed hyperparameter tuning for RL.
A typical RLlib Algorithm object: Algorithms are normally comprised of
N RolloutWorkers
that
orchestrated via a EnvRunnerGroup
object.
Each worker own its own a set of Policy
objects and their NN models per worker, plus a BaseEnv
instance per worker.#
Building Custom Algorithm Classes#
Warning
As of Ray >= 1.9, it is no longer recommended to use the build_trainer()
utility
function for creating custom Algorithm sub-classes.
Instead, follow the simple guidelines here for directly sub-classing from
Algorithm
.
In order to create a custom Algorithm, sub-class the
Algorithm
class
and override one or more of its methods. Those are in particular:
Algorithm API#
Construction and setup#
An RLlib algorithm responsible for training one or more neural network models. |
|
Subclasses should override this for custom initialization. |
|
The local EnvRunner instance within the algo's EnvRunnerGroup. |
|
The local EnvRunner instance within the algo's evaluation EnvRunnerGroup. |
Training#
Runs one logical iteration of training. |
|
Default single iteration logic of an algorithm. |
Saving and restoring#
Saves the state of the implementing class (or |
|
Restores the state of the implementing class from the given path. |
|
Creates a new algorithm instance from a given checkpoint. |
|
Returns the implementing class's current state as a dict. |
|
Sets the implementing class' state to the given state dict. |
Evaluation#
Evaluates current policy under |
Multi Agent#
Returns the (single-agent) RLModule with |
|
Adds a new policy to this Algorithm. |
|
Removes a policy from this Algorithm. |