ray.rllib.policy.torch_policy_v2.TorchPolicyV2.action_sampler_fn#

TorchPolicyV2.action_sampler_fn(model: ModelV2, *, obs_batch: numpy.array | jnp.ndarray | tf.Tensor | torch.Tensor, state_batches: numpy.array | jnp.ndarray | tf.Tensor | torch.Tensor, **kwargs) Tuple[numpy.array | jnp.ndarray | tf.Tensor | torch.Tensor, numpy.array | jnp.ndarray | tf.Tensor | torch.Tensor, numpy.array | jnp.ndarray | tf.Tensor | torch.Tensor, List[numpy.array | jnp.ndarray | tf.Tensor | torch.Tensor]][source]#

Custom function for sampling new actions given policy.

Parameters:
  • model – Underlying model.

  • obs_batch – Observation tensor batch.

  • state_batches – Action sampling state batch.

Returns:

Sampled action Log-likelihood Action distribution inputs Updated state