Note
Ray 2.40 uses RLlib’s new API stack by default. The Ray team has mostly completed transitioning algorithms, example scripts, and documentation to the new code base.
If you’re still using the old API stack, see New API stack migration guide for details on how to migrate.
Algorithms#
The Algorithm
class is the highest-level API in RLlib responsible for WHEN and WHAT of RL algorithms.
Things like WHEN should we sample the algorithm, WHEN should we perform a neural network update, and so on.
The HOW will be delegated to components such as RolloutWorker
, etc..
It is the main entry point for RLlib users to interact with RLlib’s algorithms.
It allows you to train and evaluate policies, save an experiment’s progress and restore from
a prior saved experiment when continuing an RL run.
Algorithm
is a sub-class
of Trainable
and thus fully supports distributed hyperparameter tuning for RL.
Building Custom Algorithm Classes#
Warning
As of Ray >= 1.9, it is no longer recommended to use the build_trainer()
utility
function for creating custom Algorithm sub-classes.
Instead, follow the simple guidelines here for directly sub-classing from
Algorithm
.
In order to create a custom Algorithm, sub-class the
Algorithm
class
and override one or more of its methods. Those are in particular:
Algorithm API#
Constructor#
An RLlib algorithm responsible for optimizing one or more Policies. |
|
Subclasses should override this for custom initialization. |
|
Inference and Evaluation#
Computes an action for the specified policy on the local Worker. |
|
Computes an action for the specified policy on the local worker. |
|
Evaluates current policy under |
Saving and Restoring#
Creates a new algorithm instance from a given checkpoint. |
|
Recovers an Algorithm from a state object. |
|
Return a dict mapping Module/Policy IDs to weights. |
|
Set RLModule/Policy weights by Module/Policy ID. |
|
Exports model based on export_formats. |
|
Exports Policy checkpoint to a local directory and returns an AIR Checkpoint. |
|
Exports policy model with given policy_id to a local directory. |
|
Restores training state from a given model checkpoint. |
|
Try bringing back unhealthy EnvRunners and - if successful - sync with local. |
|
Saves the current model state to a checkpoint. |
|
Exports checkpoint to a local directory. |
Training#
Runs one logical iteration of training. |
|
Default single iteration logic of an algorithm. |
Multi Agent#
Adds a new policy to this Algorithm. |
|
Removes a policy from this Algorithm. |