ray.rllib.algorithms.algorithm_config.AlgorithmConfig.multi_agent#

AlgorithmConfig.multi_agent(*, policies=<ray.rllib.utils.from_config._NotProvided object>, algorithm_config_overrides_per_module: ~typing.Dict[str, dict] | None = <ray.rllib.utils.from_config._NotProvided object>, policy_map_capacity: int | None = <ray.rllib.utils.from_config._NotProvided object>, policy_mapping_fn: ~typing.Callable[[~typing.Any, OldEpisode], str] | None = <ray.rllib.utils.from_config._NotProvided object>, policies_to_train: ~typing.Container[str] | ~typing.Callable[[str, SampleBatch | MultiAgentBatch], bool] | None = <ray.rllib.utils.from_config._NotProvided object>, policy_states_are_swappable: bool | None = <ray.rllib.utils.from_config._NotProvided object>, observation_fn: ~typing.Callable | None = <ray.rllib.utils.from_config._NotProvided object>, count_steps_by: str | None = <ray.rllib.utils.from_config._NotProvided object>, replay_mode=-1, policy_map_cache=-1) AlgorithmConfig[source]#

Sets the config’s multi-agent settings.

Validates the new multi-agent settings and translates everything into a unified multi-agent setup format. For example a policies list or set of IDs is properly converted into a dict mapping these IDs to PolicySpecs.

Parameters:
  • policies – Map of type MultiAgentPolicyConfigDict from policy ids to either 4-tuples of (policy_cls, obs_space, act_space, config) or PolicySpecs. These tuples or PolicySpecs define the class of the policy, the observation- and action spaces of the policies, and any extra config.

  • algorithm_config_overrides_per_module – Only used if _enable_new_api_stack=True. A mapping from ModuleIDs to per-module AlgorithmConfig override dicts, which apply certain settings, e.g. the learning rate, from the main AlgorithmConfig only to this particular module (within a MultiAgentRLModule). You can create override dicts by using the AlgorithmConfig.overrides utility. For example, to override your learning rate and (PPO) lambda setting just for a single RLModule with your MultiAgentRLModule, do: config.multi_agent(algorithm_config_overrides_per_module={ “module_1”: PPOConfig.overrides(lr=0.0002, lambda_=0.75), })

  • policy_map_capacity – Keep this many policies in the “policy_map” (before writing least-recently used ones to disk/S3).

  • policy_mapping_fn – Function mapping agent ids to policy ids. The signature is: (agent_id, episode, worker, **kwargs) -> PolicyID.

  • policies_to_train – Determines those policies that should be updated. Options are: - None, for training all policies. - An iterable of PolicyIDs that should be trained. - A callable, taking a PolicyID and a SampleBatch or MultiAgentBatch and returning a bool (indicating whether the given policy is trainable or not, given the particular batch). This allows you to have a policy trained only on certain data (e.g. when playing against a certain opponent).

  • policy_states_are_swappable – Whether all Policy objects in this map can be “swapped out” via a simple state = A.get_state(); B.set_state(state), where A and B are policy instances in this map. You should set this to True for significantly speeding up the PolicyMap’s cache lookup times, iff your policies all share the same neural network architecture and optimizer types. If True, the PolicyMap will not have to garbage collect old, least recently used policies, but instead keep them in memory and simply override their state with the state of the most recently accessed one. For example, in a league-based training setup, you might have 100s of the same policies in your map (playing against each other in various combinations), but all of them share the same state structure (are “swappable”).

  • observation_fn – Optional function that can be used to enhance the local agent observations to include more state. See rllib/evaluation/observation_function.py for more info.

  • count_steps_by – Which metric to use as the “batch size” when building a MultiAgentBatch. The two supported values are: “env_steps”: Count each time the env is “stepped” (no matter how many multi-agent actions are passed/how many multi-agent observations have been returned in the previous step). “agent_steps”: Count each individual agent step as one step.

Returns:

This updated AlgorithmConfig object.