PolicyMap (ray.rllib.policy.policy_map.PolicyMap)
PolicyMap (ray.rllib.policy.policy_map.PolicyMap)#
PolicyMaps
are used inside RolloutWorker`s and
map ``PolicyIDs`
(defined in the config.multiagent.policies
dictionary) to Policy
instances.
The Policies are used to calculate actions for the next environment steps, losses for
model updates, and other functionalities covered by RLlib’s Policy
API.
A mapping function is used by episode objects to map AgentIDs produced by the environment to one of the PolicyIDs.
It is possible to add and remove policies to/from the Algorithm
’s workers at any given time
(even within an ongoing episode) as well as to change the policy mapping function.
See the Algorithm’s methods: add_policy()
,
remove_policy()
, and
change_policy_mapping_fn()
for more details.
- class ray.rllib.policy.policy_map.PolicyMap(worker_index: int, num_workers: int, capacity: Optional[int] = None, path: Optional[str] = None, policy_config=None, session_creator: Optional[Callable[[], tensorflow.python.client.session.Session]] = None, seed: Optional[int] = None)[source]#
Maps policy IDs to Policy objects.
Thereby, keeps n policies in memory and - when capacity is reached - writes the least recently used to disk. This allows adding 100s of policies to a Algorithm for league-based setups w/o running out of memory.
- create_policy(policy_id: str, policy_cls: Type[Policy], observation_space: <MagicMock name='mock.Space' id='140329314031952'>, action_space: <MagicMock name='mock.Space' id='140329314031952'>, config_override, merged_config: Union[AlgorithmConfig, dict]) None [source]#
Creates a new policy and stores it to the cache.
- Parameters
policy_id – The policy ID. This is the key under which the created policy will be stored in this map.
policy_cls – The (original) policy class to use. This may still be altered in case tf-eager (and tracing) is used.
observation_space – The observation space of the policy.
action_space – The action space of the policy.
merged_config – The config object (or complete config dict) for the policy to use.
- update([E, ]**F) None. Update D from dict/iterable E and F. #
If E is present and has a .keys() method, then does: for k in E: D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k]
- get(*a, **k)#
Return the value for key if key is in the dictionary, else default.