Algorithm.add_policy(policy_id: str, policy_cls: Optional[Type[ray.rllib.policy.policy.Policy]] = None, policy: Optional[ray.rllib.policy.policy.Policy] = None, *, observation_space: Optional[gymnasium.spaces.Space] = None, action_space: Optional[gymnasium.spaces.Space] = None, config: Optional[Union[ray.rllib.algorithms.algorithm_config.AlgorithmConfig, dict]] = None, policy_state: Optional[Dict[str, Union[numpy.array, jnp.ndarray, tf.Tensor, torch.Tensor, dict, tuple]]] = None, policy_mapping_fn: Optional[Callable[[Any, int], str]] = None, policies_to_train: Optional[Union[Container[str], Callable[[str, Optional[Union[SampleBatch, MultiAgentBatch]]], bool]]] = None, evaluation_workers: bool = True, module_spec: Optional[ray.rllib.core.rl_module.rl_module.SingleAgentRLModuleSpec] = None) Optional[ray.rllib.policy.policy.Policy][source]#

Adds a new policy to this Algorithm.

  • policy_id – ID of the policy to add. IMPORTANT: Must not contain characters that are also not allowed in Unix/Win filesystems, such as: <>:"/|?*, or a dot, space or backslash at the end of the ID.

  • policy_cls – The Policy class to use for constructing the new Policy. Note: Only one of policy_cls or policy must be provided.

  • policy – The Policy instance to add to this algorithm. If not None, the given Policy object will be directly inserted into the Algorithm’s local worker and clones of that Policy will be created on all remote workers as well as all evaluation workers. Note: Only one of policy_cls or policy must be provided.

  • observation_space – The observation space of the policy to add. If None, try to infer this space from the environment.

  • action_space – The action space of the policy to add. If None, try to infer this space from the environment.

  • config – The config object or overrides for the policy to add.

  • policy_state – Optional state dict to apply to the new policy instance, right after its construction.

  • policy_mapping_fn – An optional (updated) policy mapping function to use from here on. Note that already ongoing episodes will not change their mapping but will use the old mapping till the end of the episode.

  • policies_to_train – An optional list of policy IDs to be trained or a callable taking PolicyID and SampleBatchType and returning a bool (trainable or not?). If None, will keep the existing setup in place. Policies, whose IDs are not in the list (or for which the callable returns False) will not be updated.

  • evaluation_workers – Whether to add the new policy also to the evaluation WorkerSet.

  • module_spec – In the new RLModule API we need to pass in the module_spec for the new module that is supposed to be added. Knowing the policy spec is not sufficient.


The newly added policy (the copy that got added to the local worker). If workers was provided, None is returned.