MultiAgentPrioritizedReplayBuffer.__init__(capacity: int = 10000, storage_unit: str = 'timesteps', num_shards: int = 1, replay_mode: str = 'independent', replay_sequence_override: bool = True, replay_sequence_length: int = 1, replay_burn_in: int = 0, replay_zero_init_states: bool = True, underlying_buffer_config: dict = None, prioritized_replay_alpha: float = 0.6, prioritized_replay_beta: float = 0.4, prioritized_replay_eps: float = 1e-06, **kwargs)[source]#

Initializes a MultiAgentReplayBuffer instance.

  • capacity – The capacity of the buffer, measured in storage_unit.

  • storage_unit – Either ‘timesteps’, ‘sequences’ or ‘episodes’. Specifies how experiences are stored. If they are stored in episodes, replay_sequence_length is ignored. If they are stored in episodes, replay_sequence_length is ignored.

  • num_shards – The number of buffer shards that exist in total (including this one).

  • replay_mode – One of “independent” or “lockstep”. Determines, whether batches are sampled independently or to an equal amount.

  • replay_sequence_override – If True, ignore sequences found in incoming batches, slicing them into sequences as specified by replay_sequence_length and replay_sequence_burn_in. This only has an effect if storage_unit is sequences.

  • replay_sequence_length – The sequence length (T) of a single sample. If > 1, we will sample B x T from this buffer.

  • replay_burn_in – The burn-in length in case replay_sequence_length > 0. This is the number of timesteps each sequence overlaps with the previous one to generate a better internal state (=state after the burn-in), instead of starting from 0.0 each RNN rollout.

  • replay_zero_init_states – Whether the initial states in the buffer (if replay_sequence_length > 0) are alwayas 0.0 or should be updated with the previous train_batch state outputs.

  • underlying_buffer_config – A config that contains all necessary constructor arguments and arguments for methods to call on the underlying buffers. This replaces the standard behaviour of the underlying PrioritizedReplayBuffer. The config follows the conventions of the general replay_buffer_config. kwargs for subsequent calls of methods may also be included. Example: “replay_buffer_config”: {“type”: PrioritizedReplayBuffer, “capacity”: 10, “storage_unit”: “timesteps”, prioritized_replay_alpha: 0.5, prioritized_replay_beta: 0.5, prioritized_replay_eps: 0.5}

  • prioritized_replay_alpha – Alpha parameter for a prioritized replay buffer. Use 0.0 for no prioritization.

  • prioritized_replay_beta – Beta parameter for a prioritized replay buffer.

  • prioritized_replay_eps – Epsilon parameter for a prioritized replay buffer.

  • **kwargs – Forward compatibility kwargs.