ray.rllib.policy.eager_tf_policy_v2.EagerTFPolicyV2.learn_on_loaded_batch#

EagerTFPolicyV2.learn_on_loaded_batch(offset: int = 0, buffer_index: int = 0)#

Runs a single step of SGD on an already loaded data in a buffer.

Runs an SGD step over a slice of the pre-loaded batch, offset by the offset argument (useful for performing n minibatch SGD updates repeatedly on the same, already pre-loaded data).

Updates the model weights based on the averaged per-device gradients.

Parameters:
  • offset – Offset into the preloaded data. Used for pre-loading a train-batch once to a device, then iterating over (subsampling through) this batch n times doing minibatch SGD.

  • buffer_index – The index of the buffer (a MultiGPUTowerStack) to take the already pre-loaded data from. The number of buffers on each device depends on the value of the num_multi_gpu_tower_stacks config key.

Returns:

The outputs of extra_ops evaluated over the batch.