ray.rllib.policy.eager_tf_policy_v2.EagerTFPolicyV2.compute_gradients_fn#

EagerTFPolicyV2.compute_gradients_fn(policy: Policy, optimizer: torch.optim.Optimizer | tf.keras.optimizers.Optimizer, loss: numpy.array | jnp.ndarray | tf.Tensor | torch.Tensor) List[Tuple[numpy.array | jnp.ndarray | tf.Tensor | torch.Tensor, numpy.array | jnp.ndarray | tf.Tensor | torch.Tensor]] | List[numpy.array | jnp.ndarray | tf.Tensor | torch.Tensor][source]#

Gradients computing function (from loss tensor, using local optimizer).

Parameters:
  • policy – The Policy object that generated the loss tensor and that holds the given local optimizer.

  • optimizer – The tf (local) optimizer object to calculate the gradients with.

  • loss – The loss tensor for which gradients should be calculated.

Returns:

List of the possibly clipped gradients- and variable

tuples.

Return type:

ModelGradients