ray.rllib.policy.torch_policy_v2.TorchPolicyV2.extra_action_out#
- TorchPolicyV2.extra_action_out(input_dict: Dict[str, numpy.array | jnp.ndarray | tf.Tensor | torch.Tensor], state_batches: List[numpy.array | jnp.ndarray | tf.Tensor | torch.Tensor], model: TorchModelV2, action_dist: TorchDistributionWrapper) Dict[str, numpy.array | jnp.ndarray | tf.Tensor | torch.Tensor] [source]#
Returns dict of extra info to include in experience batch.
- Parameters:
input_dict – Dict of model input tensors.
state_batches – List of state tensors.
model – Reference to the model object.
action_dist – Torch action dist object to get log-probs (e.g. for already sampled actions).
- Returns:
Extra outputs to return in a
compute_actions_from_input_dict()
call (3rd return value).