ray.rllib.policy.eager_tf_policy_v2.EagerTFPolicyV2.learn_on_batch_from_replay_buffer#
- EagerTFPolicyV2.learn_on_batch_from_replay_buffer(replay_actor: ActorHandle, policy_id: str) Dict[str, numpy.array | jnp.ndarray | tf.Tensor | torch.Tensor] #
Samples a batch from given replay actor and performs an update.
- Parameters:
replay_actor – The replay buffer actor to sample from.
policy_id – The ID of this policy.
- Returns:
Dictionary of extra metadata from
compute_gradients()
.