Hierarchical Environments#
You can implement hierarchical training as a special case of multi-agent RL. For example, consider a two-level hierarchy of policies, where a top-level policy issues high level tasks that are executed at a finer timescale by one or more low-level policies. The following timeline shows one step of the top-level policy, which corresponds to four low-level actions:
top-level: action_0 -------------------------------------> action_1 ->
low-level: action_0 -> action_1 -> action_2 -> action_3 -> action_4 ->
Alternatively, you could implement an environment, in which the two agent types don’t act at the same time (overlappingly), but the low-level agents wait for the high-level agent to issue an action, then act n times before handing back control to the high-level agent:
top-level: action_0 -----------------------------------> action_1 ->
low-level: ---------> action_0 -> action_1 -> action_2 ------------>
You can implement any of these hierarchical action patterns as a multi-agent environment with various types of agents, for example a high-level agent and a low-level agent. When set up using the correct agent to module mapping functions, from RLlib’s perspective, the problem becomes a simple independent multi-agent problem with different types of policies.
Your configuration might look something like the following:
from ray.rllib.algorithms.ppo import PPOConfig
config = (
PPOConfig()
.multi_agent(
policies={"top_level", "low_level"},
policy_mapping_fn=(
lambda aid, eps, **kw: "low_level" if aid.startswith("low_level") else "top_level"
),
policies_to_train=["top_level"],
)
)
In this setup, the appropriate rewards at any hierarchy level should be provided by the multi-agent env implementation. The environment class is also responsible for routing between agents, for example conveying goals from higher-level agents to lower-level agents as part of the lower-level agent observation.