Note

Ray 2.40 uses RLlib’s new API stack by default. The Ray team has mostly completed transitioning algorithms, example scripts, and documentation to the new code base.

If you’re still using the old API stack, see New API stack migration guide for details on how to migrate.

AlgorithmConfig API#

RLlib’s AlgorithmConfig API is the auto-validated and type-safe gateway into configuring and building an RLlib Algorithm.

In essence, you first create an instance of AlgorithmConfig and then call some of its methods to set various configuration options. RLlib uses the following, black compliant format in all parts of its code.

Note that you can chain together more than one method call, including the constructor:

from ray.rllib.algorithms.algorithm_config import AlgorithmConfig

config = (
    # Create an `AlgorithmConfig` instance.
    AlgorithmConfig()
    # Change the learning rate.
    .training(lr=0.0005)
    # Change the number of Learner actors.
    .learners(num_learners=2)
)

Hint

For value checking and type-safety reasons, you should never set attributes in your AlgorithmConfig directly, but always go through the proper methods:

# WRONG!
config.env = "CartPole-v1"  # <- don't set attributes directly

# CORRECT!
config.environment(env="CartPole-v1")  # call the proper method

Algorithm specific config classes#

You don’t use the base AlgorithmConfig class directly in practice, but always its algorithm-specific subclasses, such as PPOConfig. Each subclass comes with its own set of additional arguments to the training() method.

Normally, you should pick the specific AlgorithmConfig subclass that matches the Algorithm you would like to run your learning experiments with. For example, if you would like to use IMPALA as your algorithm, you should import its specific config class:

from ray.rllib.algorithms.impala import IMPALAConfig

config = (
    # Create an `IMPALAConfig` instance.
    IMPALAConfig()
    # Specify the RL environment.
    .environment("CartPole-v1")
    # Change the learning rate.
    .training(lr=0.0004)
)

To change algorithm-specific settings, here for IMPALA, also use the training() method:

# Change an IMPALA-specific setting (the entropy coefficient).
config.training(entropy_coeff=0.01)

You can build the IMPALA instance directly from the config object through calling the build_algo() method:

# Build the algorithm instance.
impala = config.build_algo()

The config object stored inside any built Algorithm instance is a copy of your original config. This allows you to further alter your original config object and build another algorithm instance without affecting the previously built one:

# Further alter the config without affecting the previously built IMPALA object ...
config.training(lr=0.00123)
# ... and build a new IMPALA from it.
another_impala = config.build_algo()

If you are working with Ray Tune, pass your AlgorithmConfig instance into the constructor of the Tuner:

from ray import tune

tuner = tune.Tuner(
    "IMPALA",
    param_space=config,  # <- your RLlib AlgorithmConfig object
    ..
)
# Run the experiment with Ray Tune.
results = tuner.fit()

Generic config settings#

Most config settings are generic and apply to all of RLlib’s Algorithm classes. The following sections walk you through the most important config settings users should pay close attention to before diving further into other config settings and before starting with hyperparameter fine tuning.

RL Environment#

To configure, which RL environment your algorithm trains against, use the env argument to the environment() method:

config.environment("Humanoid-v5")

See this RL environment guide for more details.

Tip

Install both Atari and MuJoCo to be able to run all of RLlib’s tuned examples:

pip install "gymnasium[atari,accept-rom-license,mujoco]"

Learning rate lr#

Set the learning rate for updating your models through the lr argument to the training() method:

config.training(lr=0.0001)

Train batch size#

Set the train batch size, per Learner actor, through the train_batch_size_per_learner argument to the training() method:

config.training(train_batch_size_per_learner=256)

Note

You can compute the total, effective train batch size through multiplying train_batch_size_per_learner with (num_learners or 1). Or you can also just check the value of your config’s total_train_batch_size property:

config.training(train_batch_size_per_learner=256)
config.learners(num_learners=2)
print(config.total_train_batch_size)  # expect: 512 = 256 * 2

Discount factor gamma#

Set the RL discount factor through the gamma argument to the training() method:

config.training(gamma=0.995)

Scaling with num_env_runners and num_learners#

Set the number of EnvRunner actors used to collect training samples through the num_env_runners argument to the env_runners() method:

config.env_runners(num_env_runners=4)

# Also use `num_envs_per_env_runner` to vectorize your environment on each EnvRunner actor.
# Note that this option is only available in single-agent setups.
#  The Ray Team is working on a solution for this restriction.
config.env_runners(num_envs_per_env_runner=10)

Set the number of Learner actors used to update your models through the num_learners argument to the learners() method. This should correspond to the number of GPUs you have available for training.

config.learners(num_learners=2)

Disable explore behavior#

Switch off/on exploratory behavior through the explore argument to the env_runners() method. To compute actions, the EnvRunner calls forward_exploration() on the RLModule when explore=True and forward_inference() when explore=False. The default value is explore=True.

# Disable exploration behavior.
# When False, the EnvRunner calls `forward_inference()` on the RLModule to compute
# actions instead of `forward_exploration()`.
config.env_runners(explore=False)

Rollout length#

Set the number of timesteps that each EnvRunner steps through with each of its RL environment copies through the rollout_fragment_length argument. Pass this argument to the env_runners() method. Note that some algorithms, like PPO, set this value automatically, based on the train batch size, number of EnvRunner actors and number of envs per EnvRunner.

config.env_runners(rollout_fragment_length=50)

All available methods and their settings#

Besides the previously described most common settings, the AlgorithmConfig class and its algo-specific subclasses come with many more configuration options.

To structure things more semantically, AlgorithmConfig groups its various config settings into the following categories, each represented by its own method:

To familiarize yourself with the vast number of RLlib’s different config options, you can browse through RLlib’s examples folder or take a look at this examples folder overview page.

Each example script usually introduces a new config setting or shows you how to implement specific customizations through a combination of setting certain config options and adding custom code to your experiment.