Note

Ray 2.10.0 introduces the alpha stage of RLlib’s “new API stack”. The Ray Team plans to transition algorithms, example scripts, and documentation to the new code base thereby incrementally replacing the “old API stack” (e.g., ModelV2, Policy, RolloutWorker) throughout the subsequent minor releases leading up to Ray 3.0.

Note, however, that so far only PPO (single- and multi-agent) and SAC (single-agent only) support the “new API stack” and continue to run by default with the old APIs. You can continue to use the existing custom (old stack) classes.

See here for more details on how to use the new API stack.

Algorithms#

The Algorithm class is the highest-level API in RLlib responsible for WHEN and WHAT of RL algorithms. Things like WHEN should we sample the algorithm, WHEN should we perform a neural network update, and so on. The HOW will be delegated to components such as RolloutWorker, etc.. It is the main entry point for RLlib users to interact with RLlib’s algorithms. It allows you to train and evaluate policies, save an experiment’s progress and restore from a prior saved experiment when continuing an RL run. Algorithm is a sub-class of Trainable and thus fully supports distributed hyperparameter tuning for RL.

../../_images/trainer_class_overview.svg

A typical RLlib Algorithm object: Algorhtms are normally comprised of N RolloutWorker that orchestrated via a EnvRunnerGroup object. Each worker own its own a set of Policy objects and their NN models per worker, plus a BaseEnv instance per worker.#

Algorithm Configuration API#

The AlgorithmConfig class represents the primary way of configuring and building an Algorithm. You don’t use AlgorithmConfig directly in practice, but rather use its algorithm-specific implementations such as PPOConfig, which each come with their own set of arguments to their respective .training() method.

Constructor#

AlgorithmConfig

A RLlib AlgorithmConfig builds an RLlib Algorithm from a given configuration.

Public methods#

copy

Creates a deep copy of this config and (un)freezes if necessary.

validate

Validates all values in this config.

freeze

Freezes this config object, such that no attributes can be set anymore.

Builder methods#

build

Builds an Algorithm from this AlgorithmConfig (or a copy thereof).

build_learner_group

Builds and returns a new LearnerGroup object based on settings in self.

build_learner

Builds and returns a new Learner object based on settings in self.

Configuration methods#

callbacks

Sets the callbacks configuration.

debugging

Sets the config's debugging settings.

environment

Sets the config's RL-environment settings.

evaluation

Sets the config's evaluation settings.

experimental

Sets the config's experimental settings.

fault_tolerance

Sets the config's fault tolerance settings.

framework

Sets the config's DL framework settings.

multi_agent

Sets the config's multi-agent settings.

offline_data

Sets the config's offline data settings.

python_environment

Sets the config's python environment settings.

reporting

Sets the config's reporting settings.

resources

Specifies resources allocated for an Algorithm and its ray actors/workers.

rl_module

Sets the config's RLModule settings.

rollouts

training

Sets the training related configuration.

Getter methods#

get_default_learner_class

Returns the Learner class to use for this algorithm.

get_default_rl_module_spec

Returns the RLModule spec to use for this algorithm.

get_evaluation_config_object

Creates a full AlgorithmConfig object from self.evaluation_config.

get_multi_rl_module_spec

Returns the MultiRLModuleSpec based on the given env/spaces.

get_multi_agent_setup

Compiles complete multi-agent config (dict) from the information in self.

get_rollout_fragment_length

Automatically infers a proper rollout_fragment_length setting if "auto".

Miscellaneous methods#

validate_train_batch_size_vs_rollout_fragment_length

Detects mismatches for train_batch_size vs rollout_fragment_length.

Building Custom Algorithm Classes#

Warning

As of Ray >= 1.9, it is no longer recommended to use the build_trainer() utility function for creating custom Algorithm sub-classes. Instead, follow the simple guidelines here for directly sub-classing from Algorithm.

In order to create a custom Algorithm, sub-class the Algorithm class and override one or more of its methods. Those are in particular:

See here for an example on how to override Algorithm.

Algorithm API#

Constructor#

Algorithm

An RLlib algorithm responsible for optimizing one or more Policies.

setup

Subclasses should override this for custom initialization.

get_default_config

Inference and Evaluation#

compute_actions

Computes an action for the specified policy on the local Worker.

compute_single_action

Computes an action for the specified policy on the local worker.

evaluate

Evaluates current policy under evaluation_config settings.

Saving and Restoring#

from_checkpoint

Creates a new algorithm instance from a given checkpoint.

from_state

Recovers an Algorithm from a state object.

get_weights

Return a dict mapping Module/Policy IDs to weights.

set_weights

Set RLModule/Policy weights by Module/Policy ID.

export_model

Exports model based on export_formats.

export_policy_checkpoint

Exports Policy checkpoint to a local directory and returns an AIR Checkpoint.

export_policy_model

Exports policy model with given policy_id to a local directory.

restore

Restores training state from a given model checkpoint.

restore_workers

Try bringing back unhealthy EnvRunners and - if successful - sync with local.

save

Saves the current model state to a checkpoint.

save_checkpoint

Exports checkpoint to a local directory.

Training#

train

Runs one logical iteration of training.

training_step

Default single iteration logic of an algorithm.

Multi Agent#

add_policy

Adds a new policy to this Algorithm.

remove_policy

Removes a policy from this Algorithm.