ray.rllib.evaluation.rollout_worker.RolloutWorker.add_policy#

RolloutWorker.add_policy(policy_id: str, policy_cls: Type[Policy] | None = None, policy: Policy | None = None, *, observation_space: gymnasium.spaces.Space | None = None, action_space: gymnasium.spaces.Space | None = None, config: dict | None = None, policy_state: Dict[str, numpy.array | jnp.ndarray | tf.Tensor | torch.Tensor | dict | tuple] | None = None, policy_mapping_fn: Callable[[Any, Episode], str] | None = None, policies_to_train: Collection[str] | Callable[[str, SampleBatch | MultiAgentBatch | Dict[str, Any]], bool] | None = None, module_spec: RLModuleSpec | None = None) Policy[source]#

Adds a new policy to this RolloutWorker.

Parameters:
  • policy_id – ID of the policy to add.

  • policy_cls – The Policy class to use for constructing the new Policy. Note: Only one of policy_cls or policy must be provided.

  • policy – The Policy instance to add to this algorithm. Note: Only one of policy_cls or policy must be provided.

  • observation_space – The observation space of the policy to add.

  • action_space – The action space of the policy to add.

  • config – The config overrides for the policy to add.

  • policy_state – Optional state dict to apply to the new policy instance, right after its construction.

  • policy_mapping_fn – An optional (updated) policy mapping function to use from here on. Note that already ongoing episodes will not change their mapping but will use the old mapping till the end of the episode.

  • policies_to_train – An optional collection of policy IDs to be trained or a callable taking PolicyID and - optionally - SampleBatchType and returning a bool (trainable or not?). If None, will keep the existing setup in place. Policies, whose IDs are not in the list (or for which the callable returns False) will not be updated.

  • module_spec – In the new RLModule API we need to pass in the module_spec for the new module that is supposed to be added. Knowing the policy spec is not sufficient.

Returns:

The newly added policy.

Raises:
  • ValueError – If both policy_cls AND policy are provided.

  • KeyError – If the given policy_id already exists in this worker’s PolicyMap.