Ray 2.10.0 introduces the alpha stage of RLlib’s “new API stack”. The Ray Team plans to transition algorithms, example scripts, and documentation to the new code base thereby incrementally replacing the “old API stack” (e.g., ModelV2, Policy, RolloutWorker) throughout the subsequent minor releases leading up to Ray 3.0.

Note, however, that so far only PPO (single- and multi-agent) and SAC (single-agent only) support the “new API stack” and continue to run by default with the old APIs. You can continue to use the existing custom (old stack) classes.

See here for more details on how to use the new API stack.

MultiAgentEnv API#


class ray.rllib.env.multi_agent_env.MultiAgentEnv(*args: Any, **kwargs: Any)[source]#

An environment that hosts multiple independent agents.

Agents are identified by (string) agent ids. Note that these “agents” here are not to be confused with RLlib Algorithms, which are also sometimes referred to as “agents” or “RL agents”.

The preferred format for action- and observation space is a mapping from agent ids to their individual spaces. If that is not provided, the respective methods’ observation_space_contains(), action_space_contains(), action_space_sample() and observation_space_sample() have to be overwritten.

reset(*, seed: int | None = None, options: dict | None = None) Tuple[Dict[Any, Any], Dict[Any, Any]][source]#

Resets the env and returns observations from ready agents.


seed – An optional seed to use for the new episode.


New observations for each ready agent.

from ray.rllib.env.multi_agent_env import MultiAgentEnv
class MyMultiAgentEnv(MultiAgentEnv):
    # Define your env here.
env = MyMultiAgentEnv()
obs, infos = env.reset(seed=42, options={})
    "car_0": [2.4, 1.6],
    "car_1": [3.4, -3.2],
    "traffic_light_1": [0, 3, 5, 1],
step(action_dict: Dict[Any, Any]) Tuple[Dict[Any, Any], Dict[Any, Any], Dict[Any, Any], Dict[Any, Any], Dict[Any, Any]][source]#

Returns observations from ready agents.

The returns are dicts mapping from agent_id strings to values. The number of agents in the env can vary over time.


Tuple containing 1) new observations for each ready agent, 2) reward values for each ready agent. If the episode is just started, the value will be None. 3) Terminated values for each ready agent. The special key “__all__” (required) is used to indicate env termination. 4) Truncated values for each ready agent. 5) Info values for each agent id (may be empty dicts).

env = ...
obs, rewards, terminateds, truncateds, infos = env.step(action_dict={
    "car_0": 1, "car_1": 0, "traffic_light_1": 2,


    "car_0": 3,
    "car_1": -1,
    "traffic_light_1": 0,
    "car_0": False,    # car_0 is still running
    "car_1": True,     # car_1 is terminated
    "__all__": False,  # the env is not terminated
    "car_0": {},  # info for car_0
    "car_1": {},  # info for car_1
render() None[source]#

Tries to render the environment.

with_agent_groups(groups: Dict[str, List[Any]], obs_space: gymnasium.Space = None, act_space: gymnasium.Space = None) MultiAgentEnv[source]#

Convenience method for grouping together agents in this env.

An agent group is a list of agent IDs that are mapped to a single logical agent. All agents of the group must act at the same time in the environment. The grouped agent exposes Tuple action and observation spaces that are the concatenated action and obs spaces of the individual agents.

The rewards of all the agents in a group are summed. The individual agent rewards are available under the “individual_rewards” key of the group info return.

Agent grouping is required to leverage algorithms such as Q-Mix.

  • groups – Mapping from group id to a list of the agent ids of group members. If an agent id is not present in any group value, it will be left ungrouped. The group id becomes a new agent ID in the final environment.

  • obs_space – Optional observation space for the grouped env. Must be a tuple space. If not provided, will infer this to be a Tuple of n individual agents spaces (n=num agents in a group).

  • act_space – Optional action space for the grouped env. Must be a tuple space. If not provided, will infer this to be a Tuple of n individual agents spaces (n=num agents in a group).

from ray.rllib.env.multi_agent_env import MultiAgentEnv
class MyMultiAgentEnv(MultiAgentEnv):
    # define your env here
env = MyMultiAgentEnv(...)
grouped_env = env.with_agent_groups(env, {
  "group1": ["agent1", "agent2", "agent3"],
  "group2": ["agent4", "agent5"],

Convert gym.Env into MultiAgentEnv#

ray.rllib.env.multi_agent_env.make_multi_agent(env_name_or_creator: str | Callable[[EnvContext], Any | gymnasium.Env | None]) Type[MultiAgentEnv][source]#

Convenience wrapper for any single-agent env to be converted into MA.

Allows you to convert a simple (single-agent) gym.Env class into a MultiAgentEnv class. This function simply stacks n instances of the given `gym.Env` class into one unified MultiAgentEnv class and returns this class, thus pretending the agents act together in the same environment, whereas - under the hood - they live separately from each other in n parallel single-agent envs.

Agent IDs in the resulting and are int numbers starting from 0 (first agent).


env_name_or_creator – String specifier or env_maker function taking an EnvContext object as only arg and returning a gym.Env.


New MultiAgentEnv class to be used as env. The constructor takes a config dict with num_agents key (default=1). The rest of the config dict will be passed on to the underlying single-agent env’s constructor.

from ray.rllib.env.multi_agent_env import make_multi_agent
# By gym string:
ma_cartpole_cls = make_multi_agent("CartPole-v1")
# Create a 2 agent multi-agent cartpole.
ma_cartpole = ma_cartpole_cls({"num_agents": 2})
obs = ma_cartpole.reset()

# By env-maker callable:
from ray.rllib.examples.envs.classes.stateless_cartpole import StatelessCartPole
ma_stateless_cartpole_cls = make_multi_agent(
   lambda config: StatelessCartPole(config))
# Create a 3 agent multi-agent stateless cartpole.
ma_stateless_cartpole = ma_stateless_cartpole_cls(
   {"num_agents": 3})
{0: [...], 1: [...]}
{0: [...], 1: [...], 2: [...]}