ray.rllib.evaluation.rollout_worker.RolloutWorker.apply_gradients#

RolloutWorker.apply_gradients(grads: List[Tuple[numpy.array | jnp.ndarray | tf.Tensor | torch.Tensor, numpy.array | jnp.ndarray | tf.Tensor | torch.Tensor]] | List[numpy.array | jnp.ndarray | tf.Tensor | torch.Tensor] | Dict[str, List[Tuple[numpy.array | jnp.ndarray | tf.Tensor | torch.Tensor, numpy.array | jnp.ndarray | tf.Tensor | torch.Tensor]] | List[numpy.array | jnp.ndarray | tf.Tensor | torch.Tensor]]) None[source]#

Applies the given gradients to this worker’s models.

Uses the Policy’s/ies’ apply_gradients method(s) to perform the operations.

Parameters:

grads – Single ModelGradients (single-agent case) or a dict mapping PolicyIDs to the respective model gradients structs.

import gymnasium as gym
from ray.rllib.evaluation.rollout_worker import RolloutWorker
from ray.rllib.algorithms.ppo.ppo_tf_policy import PPOTF1Policy
worker = RolloutWorker(
  env_creator=lambda _: gym.make("CartPole-v1"),
  default_policy_class=PPOTF1Policy)
samples = worker.sample()
grads, info = worker.compute_gradients(samples)
worker.apply_gradients(grads)