ray.rllib.policy.torch_policy_v2.TorchPolicyV2.extra_action_out#

TorchPolicyV2.extra_action_out(input_dict: Dict[str, Union[numpy.array, jnp.ndarray, tf.Tensor, torch.Tensor]], state_batches: List[Union[numpy.array, jnp.ndarray, tf.Tensor, torch.Tensor]], model: ray.rllib.models.torch.torch_modelv2.TorchModelV2, action_dist: ray.rllib.models.torch.torch_action_dist.TorchDistributionWrapper) Dict[str, Union[numpy.array, jnp.ndarray, tf.Tensor, torch.Tensor]][source]#

Returns dict of extra info to include in experience batch.

Parameters
  • input_dict – Dict of model input tensors.

  • state_batches – List of state tensors.

  • model – Reference to the model object.

  • action_dist – Torch action dist object to get log-probs (e.g. for already sampled actions).

Returns

Extra outputs to return in a compute_actions_from_input_dict() call (3rd return value).