Note
Ray 2.10.0 introduces the alpha stage of RLlib’s “new API stack”. The Ray Team plans to transition algorithms, example scripts, and documentation to the new code base thereby incrementally replacing the “old API stack” (e.g., ModelV2, Policy, RolloutWorker) throughout the subsequent minor releases leading up to Ray 3.0.
Note, however, that so far only PPO (single- and multi-agent) and SAC (single-agent only) support the “new API stack” and continue to run by default with the old APIs. You can continue to use the existing custom (old stack) classes.
See here for more details on how to use the new API stack.
LearnerGroup API#
Configuring a LearnerGroup and Learner Workers#
Specifies resources allocated for an Algorithm and its ray actors/workers. |
|
Sets the config's RLModule settings. |
|
Sets the training related configuration. |
Constructing a LearnerGroup#
Builds and returns a new LearnerGroup object based on settings in |
Coordinator of n (possibly remote) Learner workers. |
Learner API#
Constructing a Learner#
Builds and returns a new Learner object based on settings in |
Base class for Learners. |
|
Builds the Learner. |
|
Construct the multi-agent RL module for the learner. |
Performing Updates#
Do |
|
Do |
|
Called before gradient-based updates are completed. |
|
Contains all logic for an in-graph/traceable update step. |
|
Called after gradient-based updates are completed. |
Computing Losses#
Computes the loss(es) for the module being optimized. |
|
Computes the loss for a single module. |
|
Check whether the module is compatible with the learner. |
|
Returns a framework-specific tensor variable with the initial given value. |
Configuring Optimizers#
Configures an optimizer for the given module_id. |
|
Configures, creates, and registers the optimizers for this Learner. |
|
Registers an optimizer with a ModuleID, name, param list and lr-scheduler. |
|
Returns a list of (optimizer_name, optimizer instance)-tuples for module_id. |
|
Returns the optimizer object, configured under the given module_id and name. |
|
Returns the list of parameters of a module. |
|
Returns a hashable reference to a trainable parameter. |
|
Reduces the given ParamDict to contain only parameters for given optimizer. |
|
Checks that the given optimizer and parameters are valid for the framework. |
|
Updates the learning rate of the given local optimizer. |
|
Returns the gradient clipping function to use, given the framework. |
Gradient Computation#
Computes the gradients based on the given losses. |
|
Applies potential postprocessing operations on the gradients. |
|
Applies postprocessing operations on the gradients of the given module. |
|
Applies the gradients to the MultiRLModule parameters. |
Saving, Loading, Checkpointing, and Restoring States#
Returns the implementing class's current state as a dict. |
|
Sets the implementing class' state to the given state dict. |
|
Saves the state of the implementing class (or |
|
Restores the state of the implementing class from the given path. |
|
Creates a new Checkpointable instance from the given location and returns it. |
|
Returns the state of all optimizers currently registered in this Learner. |
|
Sets the state of all optimizers currently registered in this Learner. |
Adding and Removing Modules#
Adds a module to the underlying MultiRLModule. |
|
Removes a module from the Learner. |