MultiAgentReplayBuffer.__init__(capacity: int = 10000, storage_unit: str = 'timesteps', num_shards: int = 1, replay_mode: str = 'independent', replay_sequence_override: bool = True, replay_sequence_length: int = 1, replay_burn_in: int = 0, replay_zero_init_states: bool = True, underlying_buffer_config: dict = None, **kwargs)[source]#

Initializes a MultiAgentReplayBuffer instance.

  • capacity – The capacity of the buffer, measured in storage_unit.

  • storage_unit – Either ‘timesteps’, ‘sequences’ or ‘episodes’. Specifies how experiences are stored. If they are stored in episodes, replay_sequence_length is ignored.

  • num_shards – The number of buffer shards that exist in total (including this one).

  • replay_mode – One of “independent” or “lockstep”. Determines, whether batches are sampled independently or to an equal amount.

  • replay_sequence_override – If True, ignore sequences found in incoming batches, slicing them into sequences as specified by replay_sequence_length and replay_sequence_burn_in. This only has an effect if storage_unit is sequences.

  • replay_sequence_length – The sequence length (T) of a single sample. If > 1, we will sample B x T from this buffer. This only has an effect if storage_unit is ‘timesteps’.

  • replay_burn_in – This is the number of timesteps each sequence overlaps with the previous one to generate a better internal state (=state after the burn-in), instead of starting from 0.0 each RNN rollout. This only has an effect if storage_unit is sequences.

  • replay_zero_init_states – Whether the initial states in the buffer (if replay_sequence_length > 0) are alwayas 0.0 or should be updated with the previous train_batch state outputs.

  • underlying_buffer_config – A config that contains all necessary constructor arguments and arguments for methods to call on the underlying buffers.

  • **kwargs – Forward compatibility kwargs.