PolicyMap (ray.rllib.policy.policy_map.PolicyMap)

PolicyMaps are used inside RolloutWorker`s and map ``PolicyIDs` (defined in the config.multiagent.policies dictionary) to Policy instances. The Policies are used to calculate actions for the next environment steps, losses for model updates, and other functionalities covered by RLlib’s Policy API. A mapping function is used by episode objects to map AgentIDs produced by the environment to one of the PolicyIDs.

It is possible to add and remove policies to/from the Algorithm’s workers at any given time (even within an ongoing episode) as well as to change the policy mapping function. See the Algorithm’s methods: add_policy(), remove_policy(), and change_policy_mapping_fn() for more details.

class ray.rllib.policy.policy_map.PolicyMap(worker_index: int, num_workers: int, capacity: Optional[int] = None, path: Optional[str] = None, policy_config: Optional[dict] = None, session_creator: Optional[Callable[[], tensorflow.python.client.session.Session]] = None, seed: Optional[int] = None)[source]

Maps policy IDs to Policy objects.

Thereby, keeps n policies in memory and - when capacity is reached - writes the least recently used to disk. This allows adding 100s of policies to a Algorithm for league-based setups w/o running out of memory.

create_policy(policy_id: str, policy_cls: Type[Policy], observation_space: <MagicMock name='mock.Space' id='140110249332240'>, action_space: <MagicMock name='mock.Space' id='140110249332240'>, config_override: dict, merged_config: dict) None[source]

Creates a new policy and stores it to the cache.

Parameters
  • policy_id – The policy ID. This is the key under which the created policy will be stored in this map.

  • policy_cls – The (original) policy class to use. This may still be altered in case tf-eager (and tracing) is used.

  • observation_space – The observation space of the policy.

  • action_space – The action space of the policy.

  • config_override – The config override dict for this policy. This is the partial dict provided by the user.

  • merged_config – The entire config (merged default config + config_override).

items()[source]

Iterates over all policies, even the stashed-to-disk ones.

keys() a set-like object providing a view on D's keys[source]
values() an object providing a view on D's values[source]
update([E, ]**F) None.  Update D from dict/iterable E and F.

If E is present and has a .keys() method, then does: for k in E: D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k]

get(*a, **k)

Return the value for key if key is in the dictionary, else default.