ray.rllib.policy.policy.Policy.learn_on_batch#

Policy.learn_on_batch(samples: SampleBatch) Dict[str, numpy.array | jnp.ndarray | tf.Tensor | torch.Tensor][source]#

Perform one learning update, given samples.

Either this method or the combination of compute_gradients and apply_gradients must be implemented by subclasses.

Parameters:

samples – The SampleBatch object to learn from.

Returns:

Dictionary of extra metadata from compute_gradients().

policy, sample_batch = ...
policy.learn_on_batch(sample_batch)