Exploration.get_exploration_action(*, action_distribution: ActionDistribution, timestep: numpy.array | jnp.ndarray | tf.Tensor | torch.Tensor | int, explore: bool = True)[source]#

Returns a (possibly) exploratory action and its log-likelihood.

Given the Model’s logits outputs and action distribution, returns an exploratory action.

  • action_distribution – The instantiated ActionDistribution object to work with when creating exploration actions.

  • timestep – The current sampling time step. It can be a tensor for TF graph mode, otherwise an integer.

  • explore – True: “Normal” exploration behavior. False: Suppress all exploratory behavior and return a deterministic action.


A tuple consisting of 1) the chosen exploration action or a tf-op to fetch the exploration action from the graph and 2) the log-likelihood of the exploration action.