ray.rllib.policy.torch_policy_v2.TorchPolicyV2.postprocess_trajectory#

TorchPolicyV2.postprocess_trajectory(sample_batch: SampleBatch, other_agent_batches: Dict[Any, SampleBatch] | None = None, episode=None) SampleBatch[source]#

Postprocesses a trajectory and returns the processed trajectory.

The trajectory contains only data from one episode and from one agent. - If config.batch_mode=truncate_episodes (default), sample_batch may contain a truncated (at-the-end) episode, in case the config.rollout_fragment_length was reached by the sampler. - If config.batch_mode=complete_episodes, sample_batch will contain exactly one episode (no matter how long). New columns can be added to sample_batch and existing ones may be altered.

Parameters:
  • sample_batch – The SampleBatch to postprocess.

  • other_agent_batches (Optional[Dict[PolicyID, SampleBatch]]) – Optional dict of AgentIDs mapping to other agents’ trajectory data (from the same episode). NOTE: The other agents use the same policy.

  • episode (Optional[Episode]) – Optional multi-agent episode object in which the agents operated.

Returns:

The postprocessed, modified SampleBatch (or a new one).

Return type:

SampleBatch