Replay Buffer API#
Note
Ray 2.40 uses RLlib’s new API stack by default. The Ray team has mostly completed transitioning algorithms, example scripts, and documentation to the new code base.
If you’re still using the old API stack, see New API stack migration guide for details on how to migrate.
The following classes don’t take into account the separation of experiences from different policies, multi-agent replay buffers will be explained further below.
Replay Buffer Base Classes#
Specifies how batches are structured in a ReplayBuffer. |
|
The lowest-level replay buffer interface used by RLlib. |
|
This buffer implements Prioritized Experience Replay. |
|
This buffer implements reservoir sampling. |
Public Methods#
Samples |
|
Adds a batch of experiences or other data to this buffer. |
|
Returns all local state in a dict. |
|
Restores all local state to the provided |
Multi Agent Buffers#
The following classes use the above, “single-agent”, buffers as underlying buffers to facilitate splitting up experiences between the different agents’ policies. In multi-agent RL, more than one agent exists in the environment and not all of these agents may utilize the same policy (mapping M agents to N policies, where M <= N). This leads to the need for MultiAgentReplayBuffers that store the experiences of different policies separately.
A replay buffer shard for multiagent setups. |
|
A prioritized replay buffer shard for multiagent setups. |
Utility Methods#
Updates the priorities in a prioritized replay buffer, given training results. |
|
Samples a minimum of n timesteps from a given replay buffer. |