Ray 2.10.0 introduces the alpha stage of RLlib’s “new API stack”. The Ray Team plans to transition algorithms, example scripts, and documentation to the new code base thereby incrementally replacing the “old API stack” (e.g., ModelV2, Policy, RolloutWorker) throughout the subsequent minor releases leading up to Ray 3.0.

Note, however, that so far only PPO (single- and multi-agent) and SAC (single-agent only) support the “new API stack” and continue to run by default with the old APIs. You can continue to use the existing custom (old stack) classes.

See here for more details on how to use the new API stack.

Replay Buffer API#

The following classes don’t take into account the separation of experiences from different policies, multi-agent replay buffers will be explained further below.

Replay Buffer Base Classes#


Specifies how batches are structured in a ReplayBuffer.


The lowest-level replay buffer interface used by RLlib.


This buffer implements Prioritized Experience Replay.


This buffer implements reservoir sampling.

Public Methods#


Samples num_items items from this buffer.


Adds a batch of experiences or other data to this buffer.


Returns all local state in a dict.


Restores all local state to the provided state.

Multi Agent Buffers#

The following classes use the above, “single-agent”, buffers as underlying buffers to facilitate splitting up experiences between the different agents’ policies. In multi-agent RL, more than one agent exists in the environment and not all of these agents may utilize the same policy (mapping M agents to N policies, where M <= N). This leads to the need for MultiAgentReplayBuffers that store the experiences of different policies separately.


A replay buffer shard for multiagent setups.


A prioritized replay buffer shard for multiagent setups.

Utility Methods#


Updates the priorities in a prioritized replay buffer, given training results.


Samples a minimum of n timesteps from a given replay buffer.