ray.rllib.policy.torch_policy_v2.TorchPolicyV2.postprocess_trajectory#
- TorchPolicyV2.postprocess_trajectory(sample_batch: SampleBatch, other_agent_batches: Dict[Any, SampleBatch] | None = None, episode=None) SampleBatch [source]#
Postprocesses a trajectory and returns the processed trajectory.
The trajectory contains only data from one episode and from one agent. - If
config.batch_mode=truncate_episodes
(default), sample_batch may contain a truncated (at-the-end) episode, in case theconfig.rollout_fragment_length
was reached by the sampler. - Ifconfig.batch_mode=complete_episodes
, sample_batch will contain exactly one episode (no matter how long). New columns can be added to sample_batch and existing ones may be altered.- Parameters:
sample_batch – The SampleBatch to postprocess.
other_agent_batches (Optional[Dict[PolicyID, SampleBatch]]) – Optional dict of AgentIDs mapping to other agents’ trajectory data (from the same episode). NOTE: The other agents use the same policy.
episode (Optional[Episode]) – Optional multi-agent episode object in which the agents operated.
- Returns:
The postprocessed, modified SampleBatch (or a new one).
- Return type: