ray.rllib.algorithms.algorithm_config.AlgorithmConfig.get_marl_module_spec#

AlgorithmConfig.get_marl_module_spec(*, policy_dict: Dict[str, PolicySpec] | None = None, single_agent_rl_module_spec: SingleAgentRLModuleSpec | None = None, env: Any | gymnasium.Env | None = None, spaces: Dict[str, Tuple[gymnasium.Space, gymnasium.Space]] | None = None, inference_only: bool = False) → MultiAgentRLModuleSpec[source]#

Returns the MultiAgentRLModule spec based on the given policy spec dict.

policy_dict could be a partial dict of the policies that we need to turn into an equivalent multi-agent RLModule spec.

Parameters:

policy_dict – The policy spec dict. Using this dict, we can determine the inferred values for observation_space, action_space, and config for each policy. If the module spec does not have these values specified, they will get auto-filled with these values obtrained from the policy spec dict. Here we are relying on the policy’s logic for infering these values from other sources of information (e.g. environement)
single_agent_rl_module_spec – The SingleAgentRLModuleSpec to use for constructing a MultiAgentRLModuleSpec. If None, the already configured spec (self._rl_module_spec) or the default RLModuleSpec for this algorithm (self.get_default_rl_module_spec()) will be used.
env – An optional env instance, from which to infer the different spaces for the different SingleAgentRLModules. If not provided, will try to infer from spaces. Otherwise from self.observation_space and self.action_space. If no information on spaces can be infered, will raise an error.
spaces – Optional dict mapping policy IDs to tuples of 1) observation space and 2) action space that should be used for the respective policy. These spaces were usually provided by an already instantiated remote EnvRunner. If not provided, will try to infer from env. Otherwise from self.observation_space and self.action_space. If no information on spaces can be inferred, will raise an error.
inference_only – If True, the module spec will be used in either sampling or inference and can be built in its light version (if available), i.e. it contains only the networks needed for acting in the environment (no target or critic networks).