Note
Ray 2.10.0 introduces the alpha stage of RLlib’s “new API stack”. The team is currently transitioning algorithms, example scripts, and documentation to the new code base throughout the subsequent minor releases leading up to Ray 3.0.
See here for more details on how to activate and use the new API stack.
Environments#
Overview#
In online reinforcement learning (RL), an algorithm trains a policy neural network by collecting data on-the-fly using an RL environment or simulator. The agent navigates within the environment choosing actions governed by this policy and collecting the environment’s observations and rewards. The goal of the algorithm is to train the policy on the collected data such that the policy’s action choices eventually maximize the cumulative reward over the agent’s lifetime.
Farama Gymnasium#
RLlib relies on Farama’s Gymnasium API
as its main RL environment interface for single-agent training
(see here for multi-agent).
To implement custom logic with gymnasium
and integrate it into an RLlib config, see
this SimpleCorridor example.
Tip
Not all action spaces are compatible with all RLlib algorithms. See the algorithm overview for details. In particular, pay attention to which algorithms support discrete and which support continuous action spaces or both.
For more details on building a custom Farama Gymnasium environment, see the gymnasium.Env class definition.
For multi-agent training, see RLlib’s multi-agent API and supported third-party APIs.
Configuring Environments#
To specify which RL environment to train against, you can provide either a string name or a Python class that has to subclass gymnasium.Env.
Specifying by String#
RLlib interprets string values as registered gymnasium environment names by default.
For example:
config = (
PPOConfig()
# Configure the RL environment to use as a string (by name), which
# is registered with Farama's gymnasium.
.environment("Acrobot-v1")
)
algo = config.build()
print(algo.train())
Tip
For all supported environment names registered with Farama, refer to these resources (by env category):
Specifying by Subclass of gymnasium.Env#
If you’re using a custom subclass of gymnasium.Env class,
you can pass the class itself rather than a registered string. Your subclass must accept
a single config
argument in its constructor (which may default to None
).
For example:
import gymnasium as gym
import numpy as np
from ray.rllib.algorithms.ppo import PPOConfig
class MyDummyEnv(gym.Env):
# Write the constructor and provide a single `config` arg,
# which may be set to None by default.
def __init__(self, config=None):
# As per gymnasium standard, provide observation and action spaces in your
# constructor.
self.observation_space = gym.spaces.Box(-1.0, 1.0, (1,), np.float32)
self.action_space = gym.spaces.Discrete(2)
def reset(self, seed=None, options=None):
# Return (reset) observation and info dict.
return np.array([1.0]), {}
def step(self, action):
# Return next observation, reward, terminated, truncated, and info dict.
return np.array([1.0]), 1.0, False, False, {}
config = (
PPOConfig()
.environment(
MyDummyEnv,
env_config={}, # `config` to pass to your env class
)
)
algo = config.build()
print(algo.train())
Specifying by Tune-Registered Lambda#
A third option for providing environment information to your config is to register an
environment creator function (or lambda) with Ray Tune. The creator function must take a
single config
parameter and return a single non-vectorized
gymnasium.Env instance.
For example:
from ray.tune.registry import register_env
def env_creator(config):
return MyDummyEnv(config) # Return a gymnasium.Env instance.
register_env("my_env", env_creator)
config = (
PPOConfig()
.environment("my_env") # <- Tune registered string pointing to your custom env creator.
)
algo = config.build()
print(algo.train())
For a complete example using a custom environment, see the custom_gym_env.py example script.
Warning
Due to Ray’s distributed nature, gymnasium’s own registry is incompatible with Ray. Always use the registration method documented here to ensure remote Ray actors can access your custom environments.
In the preceding example, the env_creator
function takes a config
argument.
This config is primarily a dictionary containing required settings.
However, you can also access additional properties within the config
variable. For example,
use config.worker_index
to get the remote EnvRunner index or config.num_workers
for the total number of EnvRunners used. This approach can help customize environments
within an ensemble and make environments running on some EnvRunners behave differently from
those running on other EnvRunners.
For example:
class EnvDependingOnWorkerAndVectorIndex(gym.Env):
def __init__(self, config):
# Pick actual env based on worker and env indexes.
self.env = gym.make(
choose_env_for(config.worker_index, config.vector_index)
)
self.action_space = self.env.action_space
self.observation_space = self.env.observation_space
def reset(self, seed, options):
return self.env.reset(seed, options)
def step(self, action):
return self.env.step(action)
register_env("multi_env", lambda config: MultiEnv(config))
Tip
When using logging within an environment, the configuration must be done inside the environment (running within Ray workers). Pre-Ray logging configurations will be ignored. Use the following code to connect to Ray’s logging instance:
import logging
logger = logging.getLogger("ray.rllib")
Performance and Scaling#
There are two methods to scale sample collection with RLlib and gymnasium environments. You can use both in combination.
Distribute across multiple processes: RLlib creates multiple
EnvRunner
instances, each a Ray actor, for experience collection, controlled through yourAlgorithmConfig
:config.env_runners(num_env_runners=..)
.
Vectorization within a single process: Many environments achieve high frame rates per core but are limited by policy inference latency. To address this limitation, create multiple environments per process to batch the policy forward pass across these vectorized environments. Set
config.env_runners(num_envs_per_env_runner=..)
to create more than one environment copy perEnvRunner
actor. Additionally, you can make the individual sub-environments within a vector independent processes through Python’s multiprocessing used by gymnasium. Setconfig.env_runners(remote_worker_envs=True)
to create individual subenvironments as separate processes and step them in parallel.
Note
Multi-agent setups aren’t vectorizable yet. The Ray team is working on a solution for
this restriction by using the gymnasium >= 1.x
custom vectorization feature.
Tip
See the scaling guide for more on RLlib training at scale.
Expensive Environments#
Some environments may require substantial resources to initialize and run. If your environments require
more than 1 CPU per EnvRunner
, you can provide more resources for each
actor by setting the following config options:
config.env_runners(num_cpus_per_env_runner=.., num_gpus_per_env_runner=..)