Sample Batches#

SampleBatch (ray.rllib.policy.sample_batch.SampleBatch)#

Whether running in a single process or large cluster, all data interchange in RLlib happens in the form of RolloutWorker collects batches of size rollout_fragment_length, and RLlib then concatenates one or more of these batches (across different RolloutWorker in subsequent sampling steps) into a batch of size train_batch_size, which then serves as the input to a Policy’s learn_on_batch() method.

A typical sample batch looks something like the following when summarized. Since all values are kept in arrays, this allows for efficient encoding and transmission across the network:

{ 'action_logp': np.ndarray((200,), dtype=float32, min=-0.701, max=-0.685, mean=-0.694),
  'actions': np.ndarray((200,), dtype=int64, min=0.0, max=1.0, mean=0.495),
  'dones': np.ndarray((200,), dtype=bool, min=0.0, max=1.0, mean=0.055),
  'infos': np.ndarray((200,), dtype=object, head={}),
  'new_obs': np.ndarray((200, 4), dtype=float32, min=-2.46, max=2.259, mean=0.018),
  'obs': np.ndarray((200, 4), dtype=float32, min=-2.46, max=2.259, mean=0.016),
  'rewards': np.ndarray((200,), dtype=float32, min=1.0, max=1.0, mean=1.0),
  't': np.ndarray((200,), dtype=int64, min=0.0, max=34.0, mean=9.14)}

MultiAgentBatch (ray.rllib.policy.sample_batch.MultiAgentBatch)#

In multi-agent mode, several sample batches may be collected separately for each individual policy and are placed in a container object of type MultiAgentBatch: