Hierarchical Environments#

Note

Ray 2.40 uses RLlib’s new API stack by default. The Ray team has mostly completed transitioning algorithms, example scripts, and documentation to the new code base.

If you’re still using the old API stack, see New API stack migration guide for details on how to migrate.

You can implement hierarchical training as a special case of multi-agent RL. For example, consider a two-level hierarchy of policies, where a top-level policy issues high level tasks that are executed at a finer timescale by one or more low-level policies. The following timeline shows one step of the top-level policy, which corresponds to four low-level actions:

top-level: action_0 -------------------------------------> action_1 ->
low-level: action_0 -> action_1 -> action_2 -> action_3 -> action_4 ->

Alternatively, you could implement an environment, in which the two agent types don’t act at the same time (overlappingly), but the low-level agents wait for the high-level agent to issue an action, then act n times before handing back control to the high-level agent:

top-level: action_0 -----------------------------------> action_1 ->
low-level: ---------> action_0 -> action_1 -> action_2 ------------>

You can implement any of these hierarchical action patterns as a multi-agent environment with various types of agents, for example a high-level agent and a low-level agent. When set up using the correct agent to module mapping functions, from RLlib’s perspective, the problem becomes a simple independent multi-agent problem with different types of policies.

Your configuration might look something like the following:

from ray.rllib.algorithms.ppo import PPOConfig

config = (
    PPOConfig()
    .multi_agent(
        policies={"top_level", "low_level"},
        policy_mapping_fn=(
            lambda aid, eps, **kw: "low_level" if aid.startswith("low_level") else "top_level"
        ),
        policies_to_train=["top_level"],
    )
)

In this setup, the appropriate rewards at any hierarchy level should be provided by the multi-agent env implementation. The environment class is also responsible for routing between agents, for example conveying goals from higher-level agents to lower-level agents as part of the lower-level agent observation.

See this runnable example of a hierarchical env.