ray.rllib.policy.policy_map.PolicyMap.__init__#

PolicyMap.__init__(*, capacity: int = 100, policy_states_are_swappable: bool = False, worker_index=None, num_workers=None, policy_config=None, session_creator=None, seed=None)[source]#

Initializes a PolicyMap instance.

Parameters:
  • capacity – The size of the Policy object cache. This is the maximum number of policies that are held in RAM memory. When reaching this capacity, the least recently used Policy’s state will be stored in the Ray object store and recovered from there when being accessed again.

  • policy_states_are_swappable – Whether all Policy objects in this map can be “swapped out” via a simple state = A.get_state(); B.set_state(state), where A and B are policy instances in this map. You should set this to True for significantly speeding up the PolicyMap’s cache lookup times, iff your policies all share the same neural network architecture and optimizer types. If True, the PolicyMap will not have to garbage collect old, least recently used policies, but instead keep them in memory and simply override their state with the state of the most recently accessed one. For example, in a league-based training setup, you might have 100s of the same policies in your map (playing against each other in various combinations), but all of them share the same state structure (are “swappable”).