ray.rllib.policy.Policy.compute_log_likelihoods
ray.rllib.policy.Policy.compute_log_likelihoods#
- Policy.compute_log_likelihoods(actions: Union[List[Union[numpy.array, tensorflow.python.framework.ops.Tensor, torch.Tensor]], numpy.array, tensorflow.python.framework.ops.Tensor, torch.Tensor], obs_batch: Union[List[Union[numpy.array, tensorflow.python.framework.ops.Tensor, torch.Tensor]], numpy.array, tensorflow.python.framework.ops.Tensor, torch.Tensor], state_batches: Optional[List[Union[numpy.array, tensorflow.python.framework.ops.Tensor, torch.Tensor]]] = None, prev_action_batch: Optional[Union[List[Union[numpy.array, tensorflow.python.framework.ops.Tensor, torch.Tensor]], numpy.array, tensorflow.python.framework.ops.Tensor, torch.Tensor]] = None, prev_reward_batch: Optional[Union[List[Union[numpy.array, tensorflow.python.framework.ops.Tensor, torch.Tensor]], numpy.array, tensorflow.python.framework.ops.Tensor, torch.Tensor]] = None, actions_normalized: bool = True, in_training: bool = True) Union[numpy.array, tensorflow.python.framework.ops.Tensor, torch.Tensor] [source]#
Computes the log-prob/likelihood for a given action and observation.
The log-likelihood is calculated using this Policyβs action distribution class (self.dist_class).
- Parameters
actions β Batch of actions, for which to retrieve the log-probs/likelihoods (given all other inputs: obs, states, ..).
obs_batch β Batch of observations.
state_batches β List of RNN state input batches, if any.
prev_action_batch β Batch of previous action values.
prev_reward_batch β Batch of previous rewards.
actions_normalized β Is the given
actions
already normalized (between -1.0 and 1.0) or not? If not andnormalize_actions=True
, we need to normalize the given actions first, before calculating log likelihoods.in_training β Whether to use the forward_train() or forward_exploration() of the underlying RLModule.
- Returns
[BATCH_SIZE].
- Return type
Batch of log probs/likelihoods, with shape