ray.rllib.callbacks.callbacks.RLlibCallback.on_learn_on_batch#

RLlibCallback.on_learn_on_batch(*, policy: Policy, train_batch: SampleBatch, result: dict, **kwargs) → None[source]#

Called at the beginning of Policy.learn_on_batch().

Note: This is called before 0-padding via pad_batch_to_sequences_of_same_size.

Also note, SampleBatch.INFOS column will not be available on train_batch within this callback if framework is tf1, due to the fact that tf1 static graph would mistake it as part of the input dict if present. It is available though, for tf2 and torch frameworks.

Parameters:

policy – Reference to the current Policy object.
train_batch – SampleBatch to be trained on. You can mutate this object to modify the samples generated.
result – A results dict to add custom metrics to.
kwargs – Forward compatibility placeholder.