Sample Batches
Contents
Sample Batches#
SampleBatch (ray.rllib.policy.sample_batch.SampleBatch)#
Whether running in a single process or large cluster,
all data interchange in RLlib happens in the form of RolloutWorker
collects batches of size
rollout_fragment_length
, and RLlib then concatenates one or more of these batches (across different
RolloutWorker
in subsequent sampling steps) into a batch of size
train_batch_size
, which then serves as the input to a Policy
’s learn_on_batch()
method.
A typical sample batch looks something like the following when summarized. Since all values are kept in arrays, this allows for efficient encoding and transmission across the network:
{ 'action_logp': np.ndarray((200,), dtype=float32, min=-0.701, max=-0.685, mean=-0.694),
'actions': np.ndarray((200,), dtype=int64, min=0.0, max=1.0, mean=0.495),
'dones': np.ndarray((200,), dtype=bool, min=0.0, max=1.0, mean=0.055),
'infos': np.ndarray((200,), dtype=object, head={}),
'new_obs': np.ndarray((200, 4), dtype=float32, min=-2.46, max=2.259, mean=0.018),
'obs': np.ndarray((200, 4), dtype=float32, min=-2.46, max=2.259, mean=0.016),
'rewards': np.ndarray((200,), dtype=float32, min=1.0, max=1.0, mean=1.0),
't': np.ndarray((200,), dtype=int64, min=0.0, max=34.0, mean=9.14)}
MultiAgentBatch (ray.rllib.policy.sample_batch.MultiAgentBatch)#
In multi-agent mode, several sample batches may be
collected separately for each individual policy and are placed in a container object of type MultiAgentBatch
: