RLModule APIs#
Note
Ray 2.40 uses RLlib’s new API stack by default. The Ray team has mostly completed transitioning algorithms, example scripts, and documentation to the new code base.
If you’re still using the old API stack, see New API stack migration guide for details on how to migrate.
RLModule specifications and configurations#
Single RLModuleSpec#
Utility spec class to make constructing RLModules (in single-agent case) easier. |
|
Builds the RLModule from this spec. |
MultiRLModuleSpec#
A utility spec class to make it constructing MultiRLModules easier. |
|
Builds either the MultiRLModule or a (single) sub-RLModule under |
DefaultModelConfig#
Dataclass to configure all default RLlib RLModules. |
RLModule API#
Construction and setup#
Base class for RLlib modules. |
|
Sets up the components of the module. |
|
Returns a multi-agent wrapper around this module. |
Forward methods#
Use the following three forward methods when you use RLModule from inside other classes
and components. However, do NOT override them and leave them as-is in your custom subclasses.
For defining your own forward behavior, override the private methods _forward
(generic forward behavior for
all phases) or, for more granularity, use _forward_exploration
, _forward_inference
, and _forward_train
.
DO NOT OVERRIDE! Forward-pass during exploration, called from the sampler. |
|
DO NOT OVERRIDE! Forward-pass during evaluation, called from the sampler. |
|
DO NOT OVERRIDE! Forward-pass during training called from the learner. |
Override these private methods to define your custom model’s forward behavior.
- _forward
: generic forward behavior for all phases
- _forward_exploration
: for training sample collection
- _forward_inference
: for production deployments, greedy acting
- _forward_train`
: for computing loss function inputs
Generic forward pass method, used in all phases of training and evaluation. |
|
Forward-pass used for action computation with exploration behavior. |
|
Forward-pass used for action computation without exploration behavior. |
|
Forward-pass used before the loss computation (training). |
Saving and Loading#
Saves the state of the implementing class (or |
|
Restores the state of the implementing class from the given path. |
|
Creates a new Checkpointable instance from the given location and returns it. |
|
Returns the state dict of the module. |
|
Sets the implementing class' state to the given state dict. |
MultiRLModule API#
Constructor#
Base class for an RLModule that contains n sub-RLModules. |
|
Sets up the underlying, individual RLModules. |
|
Returns self in order to match |
Modifying the underlying RLModules#
Adds a module at run time to the multi-agent module. |
|
Removes a module at runtime from the multi-agent module. |
Saving and restoring#
Saves the state of the implementing class (or |
|
Restores the state of the implementing class from the given path. |
|
Creates a new Checkpointable instance from the given location and returns it. |
|
Returns the state dict of the module. |
|
Sets the state of the multi-agent module. |