ray.rllib.algorithms.algorithm_config.AlgorithmConfig.get_multi_agent_setup
ray.rllib.algorithms.algorithm_config.AlgorithmConfig.get_multi_agent_setup#
- AlgorithmConfig.get_multi_agent_setup(*, policies: Optional[Dict[str, PolicySpec]] = None, env: Optional[Any] = None, spaces: Optional[Dict[str, Tuple[gymnasium.Space, gymnasium.Space]]] = None, default_policy_class: Optional[Type[ray.rllib.policy.policy.Policy]] = None) Tuple[Dict[str, PolicySpec], Callable[[str, Union[SampleBatch, MultiAgentBatch]], bool]] [source]#
Compiles complete multi-agent config (dict) from the information in
self
.Infers the observation- and action spaces, the policy classes, and the policy’s configs. The returned
MultiAgentPolicyConfigDict
is fully unified and strictly maps PolicyIDs to complete PolicySpec objects (with all their fields not-None).Examples: .. testcode:
import gymnasium as gym from ray.rllib.algorithms.ppo import PPOConfig config = ( PPOConfig() .environment("CartPole-v1") .framework("torch") .multi_agent(policies={"pol1", "pol2"}, policies_to_train=["pol1"]) ) policy_dict, is_policy_to_train = config.get_multi_agent_setup( env=gym.make("CartPole-v1")) is_policy_to_train("pol1") is_policy_to_train("pol2")
- Parameters
policies – An optional multi-agent
policies
dict, mapping policy IDs to PolicySpec objects. If not provided, will useself.policies
instead. Note that thepolicy_class
,observation_space
, andaction_space
properties in these PolicySpecs may be None and must therefore be inferred here.env – An optional env instance, from which to infer the different spaces for the different policies. If not provided, will try to infer from
spaces
. Otherwise fromself.observation_space
andself.action_space
. If no information on spaces can be infered, will raise an error.spaces – Optional dict mapping policy IDs to tuples of 1) observation space and 2) action space that should be used for the respective policy. These spaces were usually provided by an already instantiated remote EnvRunner (usually a RolloutWorker). If not provided, will try to infer from
env
. Otherwise fromself.observation_space
andself.action_space
. If no information on spaces can be inferred, will raise an error.default_policy_class – The Policy class to use should a PolicySpec have its policy_class property set to None.
- Returns
A tuple consisting of 1) a MultiAgentPolicyConfigDict and 2) a
is_policy_to_train(PolicyID, SampleBatchType) -> bool
callable.- Raises
ValueError – In case, no spaces can be infered for the policy/ies.
ValueError – In case, two agents in the env map to the same PolicyID (according to
self.policy_mapping_fn
), but have different action- or observation spaces according to the infered space information.