ray.rllib.policy.Policy.learn_on_loaded_batch#
- Policy.learn_on_loaded_batch(offset: int = 0, buffer_index: int = 0)[source]#
Runs a single step of SGD on an already loaded data in a buffer.
Runs an SGD step over a slice of the pre-loaded batch, offset by the
offset
argument (useful for performing n minibatch SGD updates repeatedly on the same, already pre-loaded data).Updates the model weights based on the averaged per-device gradients.
- Parameters:
offset – Offset into the preloaded data. Used for pre-loading a train-batch once to a device, then iterating over (subsampling through) this batch n times doing minibatch SGD.
buffer_index – The index of the buffer (a MultiGPUTowerStack) to take the already pre-loaded data from. The number of buffers on each device depends on the value of the
num_multi_gpu_tower_stacks
config key.
- Returns:
The outputs of extra_ops evaluated over the batch.