ray.rllib.algorithms.algorithm_config.AlgorithmConfig.get_multi_agent_setup
ray.rllib.algorithms.algorithm_config.AlgorithmConfig.get_multi_agent_setup#
- AlgorithmConfig.get_multi_agent_setup(*, policies: Optional[Dict[str, PolicySpec]] = None, env: Optional[Any] = None, spaces: Optional[Dict[str, Tuple[<MagicMock name='mock.Space' id='140073565232080'>, <MagicMock name='mock.Space' id='140073565232080'>]]] = None, default_policy_class: Optional[Type[ray.rllib.policy.policy.Policy]] = None) Tuple[Dict[str, PolicySpec], Callable[[str, Union[SampleBatch, MultiAgentBatch]], bool]] [source]#
Compiles complete multi-agent config (dict) from the information in
self
.Infers the observation- and action spaces, the policy classes, and the policyβs configs. The returned
MultiAgentPolicyConfigDict
is fully unified and strictly maps PolicyIDs to complete PolicySpec objects (with all their fields not-None).Examples
>>> import numpy as np >>> from ray.rllib.algorithms.ppo import PPOConfig >>> config = ( ... PPOConfig() ... .environment("CartPole-v1") ... .framework("torch") ... .multi_agent(policies={"pol1", "pol2"}, policies_to_train=["pol1"]) ... ) >>> policy_dict, is_policy_to_train = \ ... config.get_multi_agent_setup() >>> is_policy_to_train("pol1") True >>> is_policy_to_train("pol2") False >>> print(policy_dict) { "pol1": PolicySpec( PPOTorchPolicyV2, # infered from Algo's default policy class Box(-2.0, 2.0, (4,), np.float), # infered from env Discrete(2), # infered from env {}, # not provided -> empty dict ), "pol2": PolicySpec( PPOTorchPolicyV2, # infered from Algo's default policy class Box(-2.0, 2.0, (4,), np.float), # infered from env Discrete(2), # infered from env {}, # not provided -> empty dict ), }
- Parameters
policies β An optional multi-agent
policies
dict, mapping policy IDs to PolicySpec objects. If not provided, will useself.policies
instead. Note that thepolicy_class
,observation_space
, andaction_space
properties in these PolicySpecs may be None and must therefore be inferred here.env β An optional env instance, from which to infer the different spaces for the different policies. If not provided, will try to infer from
spaces
. Otherwise fromself.observation_space
andself.action_space
. If no information on spaces can be infered, will raise an error.spaces β Optional dict mapping policy IDs to tuples of 1) observation space and 2) action space that should be used for the respective policy. These spaces were usually provided by an already instantiated remote RolloutWorker. If not provided, will try to infer from
env
. Otherwise fromself.observation_space
andself.action_space
. If no information on spaces can be infered, will raise an error.default_policy_class β The Policy class to use should a PolicySpec have its policy_class property set to None.
- Returns
A tuple consisting of 1) a MultiAgentPolicyConfigDict and 2) a
is_policy_to_train(PolicyID, SampleBatchType) -> bool
callable.- Raises
ValueError β In case, no spaces can be infered for the policy/ies.
ValueError β In case, two agents in the env map to the same PolicyID (according to
self.policy_mapping_fn
), but have different action- or observation spaces according to the infered space information.