From Ray 2.6.0 onwards, RLlib is adopting a new stack for training and model customization, gradually replacing the ModelV2 API and some convoluted parts of Policy API with the RLModule API. Click here for details.


The Algorithm class is the highest-level API in RLlib responsible for WHEN and WHAT of RL algorithms. Things like WHEN should we sample the algorithm, WHEN should we perform a neural network update, and so on. The HOW will be delegated to components such as RolloutWorker, etc.. It is the main entry point for RLlib users to interact with RLlib’s algorithms. It allows you to train and evaluate policies, save an experiment’s progress and restore from a prior saved experiment when continuing an RL run. Algorithm is a sub-class of Trainable and thus fully supports distributed hyperparameter tuning for RL.


A typical RLlib Algorithm object: Algorhtms are normally comprised of N RolloutWorker that orchestrated via a WorkerSet object. Each worker own its own a set of Policy objects and their NN models per worker, plus a BaseEnv instance per worker.#

Algorithm Configuration API#

The AlgorithmConfig class represents the primary way of configuring and building an Algorithm. You don’t use AlgorithmConfig directly in practice, but rather use its algorithm-specific implementations such as PPOConfig, which each come with their own set of arguments to their respective .training() method.



A RLlib AlgorithmConfig builds an RLlib Algorithm from a given configuration.

Public methods#

build([env, logger_creator, use_copy])

Builds an Algorithm from this AlgorithmConfig (or a copy thereof).


Freezes this config object, such that no attributes can be set anymore.


Creates a deep copy of this config and (un)freezes if necessary.


Validates all values in this config.

Configuration methods#


Sets the callbacks configuration.

debugging(*[, logger_creator, ...])

Sets the config's debugging settings.

environment([env, env_config, ...])

Sets the config's RL-environment settings.

evaluation(*[, evaluation_interval, ...])

Sets the config's evaluation settings.

experimental(*[, ...])

Sets the config's experimental settings.

fault_tolerance([recreate_failed_workers, ...])

Sets the config's fault tolerance settings.

framework([framework, eager_tracing, ...])

Sets the config's DL framework settings.

multi_agent(*[, policies, ...])

Sets the config's multi-agent settings.

offline_data(*[, input_, input_config, ...])

Sets the config's offline data settings.

python_environment(*[, ...])

Sets the config's python environment settings.

reporting(*[, ...])

Sets the config's reporting settings.

resources(*[, num_gpus, _fake_gpus, ...])

Specifies resources allocated for an Algorithm and its ray actors/workers.

rl_module(*[, rl_module_spec, ...])

Sets the config's RLModule settings.

rollouts(*[, env_runner_cls, ...])

Sets the rollout worker configuration.

training(*[, gamma, lr, grad_clip, ...])

Sets the training related configuration.

Getter methods#


Returns the Learner class to use for this algorithm.


Returns the RLModule spec to use for this algorithm.


Creates a full AlgorithmConfig object from self.evaluation_config.

get_marl_module_spec(*, policy_dict[, ...])

Returns the MultiAgentRLModule spec based on the given policy spec dict.

get_multi_agent_setup(*[, policies, env, ...])

Compiles complete multi-agent config (dict) from the information in self.


Automatically infers a proper rollout_fragment_length setting if "auto".

Miscellaneous methods#


Detects mismatches for train_batch_size vs rollout_fragment_length.

Building Custom Algorithm Classes#


As of Ray >= 1.9, it is no longer recommended to use the build_trainer() utility function for creating custom Algorithm sub-classes. Instead, follow the simple guidelines here for directly sub-classing from Algorithm.

In order to create a custom Algorithm, sub-class the Algorithm class and override one or more of its methods. Those are in particular:

See here for an example on how to override Algorithm.

Algorithm API#


Algorithm([config, env, logger_creator])

An RLlib algorithm responsible for optimizing one or more Policies.

Inference and Evaluation#

compute_actions(observations[, state, ...])

Computes an action for the specified policy on the local Worker.

compute_single_action([observation, state, ...])

Computes an action for the specified policy on the local worker.


Evaluates current policy under evaluation_config settings.

Saving and Restoring#

from_checkpoint(checkpoint[, policy_ids, ...])

Creates a new algorithm instance from a given checkpoint.


Recovers an Algorithm from a state object.


Return a dictionary of policy ids to weights.


Set policy weights by policy id.

export_model(export_formats[, export_dir])

Exports model based on export_formats.

export_policy_checkpoint(export_dir[, policy_id])

Exports Policy checkpoint to a local directory and returns an AIR Checkpoint.

export_policy_model(export_dir[, policy_id, ...])

Exports policy model with given policy_id to a local directory.

import_policy_model_from_h5(import_file[, ...])

Imports a policy's model with given policy_id from a local h5 file.

restore(checkpoint_path[, ...])

Restores training state from a given model checkpoint.



Try to restore failed workers if necessary.

save([checkpoint_dir, prevent_upload])

Saves the current model state to a checkpoint.


Exports checkpoint to a local directory.




Runs one logical iteration of training.


Default single iteration logic of an algorithm.

Multi Agent#

add_policy(policy_id[, policy_cls, policy, ...])

Adds a new policy to this Algorithm.

remove_policy([policy_id, ...])

Removes a new policy from this Algorithm.