ray.rllib.policy.policy.Policy.learn_on_loaded_batch#

Policy.learn_on_loaded_batch(offset: int = 0, buffer_index: int = 0)[source]#

Runs a single step of SGD on an already loaded data in a buffer.

Runs an SGD step over a slice of the pre-loaded batch, offset by the offset argument (useful for performing n minibatch SGD updates repeatedly on the same, already pre-loaded data).

Updates the model weights based on the averaged per-device gradients.

Parameters:

offset – Offset into the preloaded data. Used for pre-loading a train-batch once to a device, then iterating over (subsampling through) this batch n times doing minibatch SGD.
buffer_index – The index of the buffer (a MultiGPUTowerStack) to take the already pre-loaded data from. The number of buffers on each device depends on the value of the num_multi_gpu_tower_stacks config key.

Returns:

The outputs of extra_ops evaluated over the batch.