ray.rllib.policy.Policy.learn_on_batch#
- Policy.learn_on_batch(samples: SampleBatch) Dict[str, numpy.array | jnp.ndarray | tf.Tensor | torch.Tensor] [source]#
Perform one learning update, given
samples
.Either this method or the combination of
compute_gradients
andapply_gradients
must be implemented by subclasses.- Parameters:
samples – The SampleBatch object to learn from.
- Returns:
Dictionary of extra metadata from
compute_gradients()
.
policy, sample_batch = ... policy.learn_on_batch(sample_batch)