Policy.postprocess_trajectory(sample_batch: SampleBatch, other_agent_batches: Dict[Any, Tuple[Policy, SampleBatch]] | None = None, episode: Episode | None = None) SampleBatch[source]#

Implements algorithm-specific trajectory postprocessing.

This will be called on each trajectory fragment computed during policy evaluation. Each fragment is guaranteed to be only from one episode. The given fragment may or may not contain the end of this episode, depending on the batch_mode=truncate_episodes|complete_episodes, rollout_fragment_length, and other settings.

  • sample_batch – batch of experiences for the policy, which will contain at most one episode trajectory.

  • other_agent_batches – In a multi-agent env, this contains a mapping of agent ids to (policy, agent_batch) tuples containing the policy and experiences of the other agents.

  • episode – An optional multi-agent episode object to provide access to all of the internal episode state, which may be useful for model-based or multi-agent algorithms.


The postprocessed sample batch.