Note

Ray 2.10.0 introduces the alpha stage of RLlib’s “new API stack”. The Ray Team plans to transition algorithms, example scripts, and documentation to the new code base thereby incrementally replacing the “old API stack” (e.g., ModelV2, Policy, RolloutWorker) throughout the subsequent minor releases leading up to Ray 3.0.

Note, however, that so far only PPO (single- and multi-agent) and SAC (single-agent only) support the “new API stack” and continue to run by default with the old APIs. You can continue to use the existing custom (old stack) classes.

See here for more details on how to use the new API stack.

Replay Buffer API#

The following classes don’t take into account the separation of experiences from different policies, multi-agent replay buffers will be explained further below.

Replay Buffer Base Classes#

StorageUnit

Specifies how batches are structured in a ReplayBuffer.

ReplayBuffer

The lowest-level replay buffer interface used by RLlib.

PrioritizedReplayBuffer

This buffer implements Prioritized Experience Replay.

ReservoirReplayBuffer

This buffer implements reservoir sampling.

Public Methods#

sample

Samples num_items items from this buffer.

add

Adds a batch of experiences or other data to this buffer.

get_state

Returns all local state in a dict.

set_state

Restores all local state to the provided state.

Multi Agent Buffers#

The following classes use the above, “single-agent”, buffers as underlying buffers to facilitate splitting up experiences between the different agents’ policies. In multi-agent RL, more than one agent exists in the environment and not all of these agents may utilize the same policy (mapping M agents to N policies, where M <= N). This leads to the need for MultiAgentReplayBuffers that store the experiences of different policies separately.

MultiAgentReplayBuffer

A replay buffer shard for multiagent setups.

MultiAgentPrioritizedReplayBuffer

A prioritized replay buffer shard for multiagent setups.

Utility Methods#

update_priorities_in_replay_buffer

Updates the priorities in a prioritized replay buffer, given training results.

sample_min_n_steps_from_buffer

Samples a minimum of n timesteps from a given replay buffer.