ray.rllib.algorithms.algorithm.Algorithm#

class ray.rllib.algorithms.algorithm.Algorithm(config: AlgorithmConfig | None = None, env=None, logger_creator: Callable[[], Logger] | None = None, **kwargs)[source]#

Bases: Checkpointable, Trainable, AlgorithmBase

An RLlib algorithm responsible for optimizing one or more Policies.

Algorithms contain a EnvRunnerGroup under self.env_runner_group. An EnvRunnerGroup is composed of a single local EnvRunner (self.env_runner_group.local_env_runner), serving as the reference copy of the NeuralNetwork(s) to be trained and optionally one or more remote EnvRunners used to generate environment samples in parallel. EnvRunnerGroup is fault-tolerant and elastic. It tracks health states for all the managed remote EnvRunner actors. As a result, Algorithm should never access the underlying actor handles directly. Instead, always access them via all the foreach APIs with assigned IDs of the underlying EnvRunners.

Each EnvRunners (remotes or local) contains a PolicyMap, which itself may contain either one policy for single-agent training or one or more policies for multi-agent training. Policies are synchronized automatically from time to time using ray.remote calls. The exact synchronization logic depends on the specific algorithm used, but this usually happens from local worker to all remote workers and after each training update.

You can write your own Algorithm classes by sub-classing from Algorithm or any of its built-in sub-classes. This allows you to override the training_step method to implement your own algorithm logic. You can find the different built-in algorithms’ training_step() methods in their respective main .py files, e.g. rllib.algorithms.dqn.dqn.py or rllib.algorithms.impala.impala.py.

The most important API methods a Algorithm exposes are train(), evaluate(), save_to_path() and restore_from_path().

Methods

__init__

Initializes an Algorithm instance.

add_module

Adds a new (single-agent) RLModule to this Algorithm's MARLModule.

add_policy

Adds a new policy to this Algorithm.

compute_actions

Computes an action for the specified policy on the local Worker.

compute_single_action

Computes an action for the specified policy on the local worker.

evaluate

Evaluates current policy under evaluation_config settings.

export_model

Exports model based on export_formats.

export_policy_checkpoint

Exports Policy checkpoint to a local directory and returns an AIR Checkpoint.

export_policy_model

Exports policy model with given policy_id to a local directory.

from_checkpoint

Creates a new algorithm instance from a given checkpoint.

from_state

Recovers an Algorithm from a state object.

get_config

Returns configuration passed in by Tune.

get_default_policy_class

Returns a default Policy class to use, given a config.

get_metadata

Returns JSON writable metadata further describing the implementing class.

get_module

Returns the (single-agent) RLModule with model_id (None if ID not found).

get_policy

Return policy for the specified id, or None.

get_weights

Return a dict mapping Module/Policy IDs to weights.

merge_algorithm_configs

Merges a complete Algorithm config dict with a partial override dict.

remove_module

Removes a new (single-agent) RLModule from this Algorithm's MARLModule.

remove_policy

Removes a policy from this Algorithm.

reset

Resets trial for use with new config.

reset_config

Resets configuration without restarting the trial.

restore

Restores training state from a given model checkpoint.

restore_workers

Try bringing back unhealthy EnvRunners and - if successful - sync with local.

save

Saves the current model state to a checkpoint.

save_checkpoint

Exports checkpoint to a local directory.

save_to_path

Saves the state of the implementing class (or state) to path.

set_weights

Set RLModule/Policy weights by Module/Policy ID.

step

Implements the main Algorithm.train() logic.

stop

Releases all resources used by this trainable.

train

Runs one logical iteration of training.

train_buffered

Runs multiple iterations of training.

training_step

Default single iteration logic of an algorithm.

validate_env

Env validator function for this Algorithm class.

Attributes

CLASS_AND_CTOR_ARGS_FILE_NAME

METADATA_FILE_NAME

STATE_FILE_NAME

env_runner

eval_env_runner

iteration

Current training iteration.

logdir

Directory of the results and checkpoints for this Trainable.

training_iteration

Current training iteration (same as self.iteration).

trial_id

Trial ID for the corresponding trial of this Trainable.

trial_name

Trial name for the corresponding trial of this Trainable.

trial_resources

Resources currently assigned to the trial of this Trainable.