ray.rllib.policy.policy.Policy.learn_on_batch_from_replay_buffer#

Policy.learn_on_batch_from_replay_buffer(replay_actor: ActorHandle, policy_id: str) Dict[str, numpy.array | jnp.ndarray | tf.Tensor | torch.Tensor][source]#

Samples a batch from given replay actor and performs an update.

Parameters:
  • replay_actor – The replay buffer actor to sample from.

  • policy_id – The ID of this policy.

Returns:

Dictionary of extra metadata from compute_gradients().