ray.rllib.policy.torch_policy_v2.TorchPolicyV2.extra_action_out#

TorchPolicyV2.extra_action_out(input_dict: Dict[str, numpy.array | jnp.ndarray | tf.Tensor | torch.Tensor], state_batches: List[numpy.array | jnp.ndarray | tf.Tensor | torch.Tensor], model: TorchModelV2, action_dist: TorchDistributionWrapper) Dict[str, numpy.array | jnp.ndarray | tf.Tensor | torch.Tensor][source]#

Returns dict of extra info to include in experience batch.

Parameters:
  • input_dict – Dict of model input tensors.

  • state_batches – List of state tensors.

  • model – Reference to the model object.

  • action_dist – Torch action dist object to get log-probs (e.g. for already sampled actions).

Returns:

Extra outputs to return in a compute_actions_from_input_dict() call (3rd return value).