RLModule APIs#

Note

Ray 2.40 uses RLlib’s new API stack by default. The Ray team has mostly completed transitioning algorithms, example scripts, and documentation to the new code base.

If you’re still using the old API stack, see New API stack migration guide for details on how to migrate.

RLModule specifications and configurations#

Single RLModuleSpec#

RLModuleSpec

Utility spec class to make constructing RLModules (in single-agent case) easier.

RLModuleSpec.build

Builds the RLModule from this spec.

MultiRLModuleSpec#

MultiRLModuleSpec

A utility spec class to make it constructing MultiRLModules easier.

MultiRLModuleSpec.build

Builds either the MultiRLModule or a (single) sub-RLModule under module_id.

DefaultModelConfig#

DefaultModelConfig

Dataclass to configure all default RLlib RLModules.

RLModule API#

Construction and setup#

RLModule

Base class for RLlib modules.

RLModule.observation_space

RLModule.action_space

RLModule.inference_only

RLModule.model_config

RLModule.setup

Sets up the components of the module.

RLModule.as_multi_rl_module

Returns a multi-agent wrapper around this module.

Forward methods#

Use the following three forward methods when you use RLModule from inside other classes and components. However, do NOT override them and leave them as-is in your custom subclasses. For defining your own forward behavior, override the private methods _forward (generic forward behavior for all phases) or, for more granularity, use _forward_exploration, _forward_inference, and _forward_train.

forward_exploration

DO NOT OVERRIDE! Forward-pass during exploration, called from the sampler.

forward_inference

DO NOT OVERRIDE! Forward-pass during evaluation, called from the sampler.

forward_train

DO NOT OVERRIDE! Forward-pass during training called from the learner.

Override these private methods to define your custom model’s forward behavior. - _forward: generic forward behavior for all phases - _forward_exploration: for training sample collection - _forward_inference: for production deployments, greedy acting - _forward_train`: for computing loss function inputs

_forward

Generic forward pass method, used in all phases of training and evaluation.

_forward_exploration

Forward-pass used for action computation with exploration behavior.

_forward_inference

Forward-pass used for action computation without exploration behavior.

_forward_train

Forward-pass used before the loss computation (training).

Saving and Loading#

save_to_path

Saves the state of the implementing class (or state) to path.

restore_from_path

Restores the state of the implementing class from the given path.

from_checkpoint

Creates a new Checkpointable instance from the given location and returns it.

get_state

Returns the state dict of the module.

set_state

Sets the implementing class' state to the given state dict.

MultiRLModule API#

Constructor#

MultiRLModule

Base class for an RLModule that contains n sub-RLModules.

MultiRLModule.setup

Sets up the underlying, individual RLModules.

MultiRLModule.as_multi_rl_module

Returns self in order to match RLModule.as_multi_rl_module() behavior.

Modifying the underlying RLModules#

add_module

Adds a module at run time to the multi-agent module.

remove_module

Removes a module at runtime from the multi-agent module.

Saving and restoring#

save_to_path

Saves the state of the implementing class (or state) to path.

restore_from_path

Restores the state of the implementing class from the given path.

from_checkpoint

Creates a new Checkpointable instance from the given location and returns it.

get_state

Returns the state dict of the module.

set_state

Sets the state of the multi-agent module.