.. include:: /_includes/rllib/we_are_hiring.rst .. _rllib-environments-doc: Environments ============ .. toctree:: :hidden: multi-agent-envs hierarchical-envs external-envs .. include:: /_includes/rllib/new_api_stack.rst .. grid:: 1 2 3 4 :gutter: 1 :class-container: container pb-3 .. grid-item-card:: :img-top: /rllib/images/envs/single_agent_env_logo.svg :class-img-top: pt-2 w-75 d-block mx-auto fixed-height-img .. button-ref:: rllib-single-agent-env-doc Single-Agent Environments (this page) .. grid-item-card:: :img-top: /rllib/images/envs/multi_agent_env_logo.svg :class-img-top: pt-2 w-75 d-block mx-auto fixed-height-img .. button-ref:: rllib-multi-agent-environments-doc Multi-Agent Environments .. grid-item-card:: :img-top: /rllib/images/envs/external_env_logo.svg :class-img-top: pt-2 w-75 d-block mx-auto fixed-height-img .. button-ref:: rllib-external-env-setups-doc External Environments and Applications .. grid-item-card:: :img-top: /rllib/images/envs/hierarchical_env_logo.svg :class-img-top: pt-2 w-75 d-block mx-auto fixed-height-img .. button-ref:: rllib-hierarchical-environments-doc Hierarchical Environments .. _rllib-single-agent-env-doc: In online reinforcement learning (RL), an algorithm trains a policy neural network by collecting data on-the-fly using an RL environment or simulator. The agent navigates within the environment choosing actions governed by this policy and collecting the environment's observations and rewards. The goal of the algorithm is to train the policy on the collected data such that the policy's action choices eventually maximize the cumulative reward over the agent's lifetime. .. figure:: images/envs/single_agent_setup.svg :width: 600 :align: left **Single-agent setup:** One agent lives in the environment and takes actions computed by a single policy. The mapping from agent to policy is fixed ("default_agent" maps to "default_policy"). See :ref:`Multi-Agent Environments ` for how this setup generalizes in the multi-agent case. .. _gymnasium: Farama Gymnasium ---------------- RLlib relies on `Farama's Gymnasium API `__ as its main RL environment interface for **single-agent** training (:ref:`see here for multi-agent `). To implement custom logic with `gymnasium` and integrate it into an RLlib config, see this `SimpleCorridor example `__. .. tip:: Not all action spaces are compatible with all RLlib algorithms. See the `algorithm overview `__ for details. In particular, pay attention to which algorithms support discrete and which support continuous action spaces or both. For more details on building a custom `Farama Gymnasium `__ environment, see the `gymnasium.Env class definition `__. For **multi-agent** training, see :ref:`RLlib's multi-agent API and supported third-party APIs `. .. _configuring-environments: Configuring Environments ------------------------ To specify which RL environment to train against, you can provide either a string name or a Python class that has to subclass `gymnasium.Env `__. Specifying by String ~~~~~~~~~~~~~~~~~~~~ RLlib interprets string values as `registered gymnasium environment names `__ by default. For example: .. testcode:: from ray.rllib.algorithms.ppo import PPOConfig config = ( PPOConfig() # Configure the RL environment to use as a string (by name), which # is registered with Farama's gymnasium. .environment("Acrobot-v1") ) algo = config.build() print(algo.train()) .. testcode:: :hide: algo.stop() .. tip:: For all supported environment names registered with Farama, refer to these resources (by env category): * `Toy Text `__ * `Classic Control `__ * `Atari `__ * `MuJoCo `__ * `Box2D `__ Specifying by Subclass of gymnasium.Env ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ If you're using a custom subclass of `gymnasium.Env class `__, you can pass the class itself rather than a registered string. Your subclass must accept a single ``config`` argument in its constructor (which may default to `None`). For example: .. testcode:: import gymnasium as gym import numpy as np from ray.rllib.algorithms.ppo import PPOConfig class MyDummyEnv(gym.Env): # Write the constructor and provide a single `config` arg, # which may be set to None by default. def __init__(self, config=None): # As per gymnasium standard, provide observation and action spaces in your # constructor. self.observation_space = gym.spaces.Box(-1.0, 1.0, (1,), np.float32) self.action_space = gym.spaces.Discrete(2) def reset(self, seed=None, options=None): # Return (reset) observation and info dict. return np.array([1.0]), {} def step(self, action): # Return next observation, reward, terminated, truncated, and info dict. return np.array([1.0]), 1.0, False, False, {} config = ( PPOConfig() .environment( MyDummyEnv, env_config={}, # `config` to pass to your env class ) ) algo = config.build() print(algo.train()) .. testcode:: :hide: algo.stop() Specifying by Tune-Registered Lambda ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ A third option for providing environment information to your config is to register an environment creator function (or lambda) with Ray Tune. The creator function must take a single ``config`` parameter and return a single non-vectorized `gymnasium.Env `__ instance. For example: .. testcode:: from ray.tune.registry import register_env def env_creator(config): return MyDummyEnv(config) # Return a gymnasium.Env instance. register_env("my_env", env_creator) config = ( PPOConfig() .environment("my_env") # <- Tune registered string pointing to your custom env creator. ) algo = config.build() print(algo.train()) .. testcode:: :hide: algo.stop() For a complete example using a custom environment, see the `custom_gym_env.py example script `__. .. warning:: Due to Ray's distributed nature, gymnasium's own registry is incompatible with Ray. Always use the registration method documented here to ensure remote Ray actors can access your custom environments. In the preceding example, the ``env_creator`` function takes a ``config`` argument. This config is primarily a dictionary containing required settings. However, you can also access additional properties within the ``config`` variable. For example, use ``config.worker_index`` to get the remote EnvRunner index or ``config.num_workers`` for the total number of EnvRunners used. This approach can help customize environments within an ensemble and make environments running on some EnvRunners behave differently from those running on other EnvRunners. For example: .. code-block:: python class EnvDependingOnWorkerAndVectorIndex(gym.Env): def __init__(self, config): # Pick actual env based on worker and env indexes. self.env = gym.make( choose_env_for(config.worker_index, config.vector_index) ) self.action_space = self.env.action_space self.observation_space = self.env.observation_space def reset(self, seed, options): return self.env.reset(seed, options) def step(self, action): return self.env.step(action) register_env("multi_env", lambda config: MultiEnv(config)) .. tip:: When using logging within an environment, the configuration must be done inside the environment (running within Ray workers). Pre-Ray logging configurations will be ignored. Use the following code to connect to Ray's logging instance: .. testcode:: import logging logger = logging.getLogger("ray.rllib") Performance and Scaling ----------------------- .. figure:: images/envs/env_runners.svg :width: 600 :align: left **EnvRunner with gym.Env setup:** Environments in RLlib are located within the :py:class:`~ray.rllib.envs.env_runner.EnvRunner` actors, whose number (`n`) you can scale through the `config.env_runners(num_env_runners=..)` setting. Each :py:class:`~ray.rllib.envs.env_runner.EnvRunner` actor can hold more than one `gymnasium `__ environment (vectorized). You can set the number of individual environment copies per EnvRunner through `config.env_runners(num_envs_per_env_runner=..)`. There are two methods to scale sample collection with RLlib and `gymnasium `__ environments. You can use both in combination. 1. **Distribute across multiple processes:** RLlib creates multiple :py:class:`~ray.rllib.envs.env_runner.EnvRunner` instances, each a Ray actor, for experience collection, controlled through your :py:class:`~ray.rllib.algorithms.algorithm_config.AlgorithmConfig`: ``config.env_runners(num_env_runners=..)``. 1. **Vectorization within a single process:** Many environments achieve high frame rates per core but are limited by policy inference latency. To address this limitation, create multiple environments per process to batch the policy forward pass across these vectorized environments. Set ``config.env_runners(num_envs_per_env_runner=..)`` to create more than one environment copy per :py:class:`~ray.rllib.envs.env_runner.EnvRunner` actor. Additionally, you can make the individual sub-environments within a vector independent processes through Python's multiprocessing used by gymnasium. Set `config.env_runners(remote_worker_envs=True)` to create individual subenvironments as separate processes and step them in parallel. .. note:: Multi-agent setups aren't vectorizable yet. The Ray team is working on a solution for this restriction by using the `gymnasium >= 1.x` custom vectorization feature. .. tip:: See the :ref:`scaling guide ` for more on RLlib training at scale. Expensive Environments ~~~~~~~~~~~~~~~~~~~~~~ Some environments may require substantial resources to initialize and run. If your environments require more than 1 CPU per :py:class:`~ray.rllib.envs.env_runner.EnvRunner`, you can provide more resources for each actor by setting the following config options: ``config.env_runners(num_cpus_per_env_runner=.., num_gpus_per_env_runner=..)``