ray.rllib.policy.torch_policy_v2.TorchPolicyV2.action_distribution_fn#
- TorchPolicyV2.action_distribution_fn(model: ModelV2, *, obs_batch: numpy.array | jnp.ndarray | tf.Tensor | torch.Tensor, state_batches: numpy.array | jnp.ndarray | tf.Tensor | torch.Tensor, **kwargs) Tuple[numpy.array | jnp.ndarray | tf.Tensor | torch.Tensor, type, List[numpy.array | jnp.ndarray | tf.Tensor | torch.Tensor]] [source]#
Action distribution function for this Policy.
- Parameters:
model – Underlying model.
obs_batch – Observation tensor batch.
state_batches – Action sampling state batch.
- Returns:
Distribution input. ActionDistribution class. State outs.