ray.rllib.utils.exploration.exploration.Exploration.get_exploration_action#

Exploration.get_exploration_action(*, action_distribution: ActionDistribution, timestep: numpy.array | jnp.ndarray | tf.Tensor | torch.Tensor | int, explore: bool = True)[source]#

Returns a (possibly) exploratory action and its log-likelihood.

Given the Model’s logits outputs and action distribution, returns an exploratory action.

Parameters:

action_distribution – The instantiated ActionDistribution object to work with when creating exploration actions.
timestep – The current sampling time step. It can be a tensor for TF graph mode, otherwise an integer.
explore – True: “Normal” exploration behavior. False: Suppress all exploratory behavior and return a deterministic action.

Returns:

A tuple consisting of 1) the chosen exploration action or a tf-op to fetch the exploration action from the graph and 2) the log-likelihood of the exploration action.