PolicyMaps are used inside
map ``PolicyIDs` (defined in the
config.multiagent.policies dictionary) to
The Policies are used to calculate actions for the next environment steps, losses for
model updates, and other functionalities covered by RLlib’s
A mapping function is used by episode objects to map AgentIDs produced by the environment to one of the PolicyIDs.
It is possible to add and remove policies to/from the
Algorithm’s workers at any given time
(even within an ongoing episode) as well as to change the policy mapping function.
See the Algorithm’s methods:
change_policy_mapping_fn() for more details.
- class ray.rllib.policy.policy_map.PolicyMap(worker_index: int, num_workers: int, capacity: Optional[int] = None, path: Optional[str] = None, policy_config: Optional[dict] = None, session_creator: Optional[Callable[, <MagicMock name='mock.compat.v1.Session' id='140325738360656'>]] = None, seed: Optional[int] = None)¶
Maps policy IDs to Policy objects.
Thereby, keeps n policies in memory and - when capacity is reached - writes the least recently used to disk. This allows adding 100s of policies to a Algorithm for league-based setups w/o running out of memory.
- create_policy(policy_id: str, policy_cls: Type[Policy], observation_space: <MagicMock name='mock.Space' id='140325794875408'>, action_space: <MagicMock name='mock.Space' id='140325794875408'>, config_override: dict, merged_config: dict) None ¶
Creates a new policy and stores it to the cache.
policy_id – The policy ID. This is the key under which the created policy will be stored in this map.
policy_cls – The (original) policy class to use. This may still be altered in case tf-eager (and tracing) is used.
observation_space – The observation space of the policy.
action_space – The action space of the policy.
config_override – The config override dict for this policy. This is the partial dict provided by the user.
merged_config – The entire config (merged default config +
Iterates over all policies, even the stashed-to-disk ones.
- keys() a set-like object providing a view on D's keys ¶
- values() an object providing a view on D's values ¶
- update([E, ]**F) None. Update D from dict/iterable E and F. ¶
If E is present and has a .keys() method, then does: for k in E: D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k]
- get(*a, **k)¶
Return the value for key if key is in the dictionary, else default.