Note

Ray 2.10.0 introduces the alpha stage of RLlib’s “new API stack”. The Ray Team plans to transition algorithms, example scripts, and documentation to the new code base thereby incrementally replacing the “old API stack” (e.g., ModelV2, Policy, RolloutWorker) throughout the subsequent minor releases leading up to Ray 3.0.

Note, however, that so far only PPO (single- and multi-agent) and SAC (single-agent only) support the “new API stack” and continue to run by default with the old APIs. You can continue to use the existing custom (old stack) classes.

See here for more details on how to use the new API stack.

Note

This doc is related to RLlib’s new API stack and therefore experimental.

LearnerGroup API#

Configuring a LearnerGroup and Learner Workers#

AlgorithmConfig.resources

Specifies resources allocated for an Algorithm and its ray actors/workers.

AlgorithmConfig.rl_module

Sets the config's RLModule settings.

AlgorithmConfig.training

Sets the training related configuration.

Constructing a LearnerGroup#

AlgorithmConfig.build_learner_group

Builds and returns a new LearnerGroup object based on settings in self.

LearnerGroup

Coordinator of n (possibly remote) Learner workers.

Learner API#

Constructing a Learner#

AlgorithmConfig.build_learner

Builds and returns a new Learner object based on settings in self.

Learner

Base class for Learners.

Learner.build

Builds the Learner.

Learner._check_is_built

Learner._make_module

Construct the multi-agent RL module for the learner.

Performing Updates#

Learner.update_from_batch

Do num_iters minibatch updates given a train batch.

Learner.update_from_episodes

Do num_iters minibatch updates given a list of episodes.

Learner._update

Contains all logic for an in-graph/traceable update step.

Learner.additional_update

Apply additional non-gradient based updates to this Algorithm.

Learner.additional_update_for_module

Apply additional non-gradient based updates for a single module.

Learner._convert_batch_type

Converts the elements of a MultiAgentBatch to Tensors on the correct device.

Computing Losses#

Learner.compute_loss

Computes the loss for the module being optimized.

Learner.compute_loss_for_module

Computes the loss for a single module.

Learner._is_module_compatible_with_learner

Check whether the module is compatible with the learner.

Learner._get_tensor_variable

Returns a framework-specific tensor variable with the initial given value.

Configuring Optimizers#

Learner.configure_optimizers_for_module

Configures an optimizer for the given module_id.

Learner.configure_optimizers

Configures, creates, and registers the optimizers for this Learner.

Learner.register_optimizer

Registers an optimizer with a ModuleID, name, param list and lr-scheduler.

Learner.get_optimizers_for_module

Returns a list of (optimizer_name, optimizer instance)-tuples for module_id.

Learner.get_optimizer

Returns the optimizer object, configured under the given module_id and name.

Learner.get_parameters

Returns the list of parameters of a module.

Learner.get_param_ref

Returns a hashable reference to a trainable parameter.

Learner.filter_param_dict_for_optimizer

Reduces the given ParamDict to contain only parameters for given optimizer.

Learner._check_registered_optimizer

Checks that the given optimizer and parameters are valid for the framework.

Learner._set_optimizer_lr

Updates the learning rate of the given local optimizer.

Learner._get_clip_function

Returns the gradient clipping function to use, given the framework.

Gradient Computation#

Learner.compute_gradients

Computes the gradients based on the given losses.

Learner.postprocess_gradients

Applies potential postprocessing operations on the gradients.

Learner.postprocess_gradients_for_module

Applies postprocessing operations on the gradients of the given module.

Learner.apply_gradients

Applies the gradients to the MultiAgentRLModule parameters.

Saving, Loading, Checkpointing, and Restoring States#

Learner.save_state

Save the state of the learner to path

Learner.load_state

Load the state of the learner from path

Learner._save_optimizers

Save the state of the optimizer to path

Learner._load_optimizers

Load the state of the optimizer from path

Learner.get_state

Get the state of the learner.

Learner.set_state

Set the state of the learner.

Learner.get_optimizer_state

Returns the state of all optimizers currently registered in this Learner.

Learner.set_optimizer_state

Sets the state of all optimizers currently registered in this Learner.

Learner._get_metadata

Adding and Removing Modules#

Learner.add_module

Add a module to the underlying MultiAgentRLModule and the Learner.

Learner.remove_module

Remove a module from the Learner.

Managing Results#

Learner.compile_results

Compile results from the update in a numpy-friendly format.

Learner.register_metric

Registers a single key/value metric pair for loss- and gradient stats.

Learner.register_metrics

Registers several key/value metric pairs for loss- and gradient stats.

Learner._check_result

Checks whether the result has the correct format.