ray.rllib.utils.exploration.epsilon_greedy.EpsilonGreedy#

class ray.rllib.utils.exploration.epsilon_greedy.EpsilonGreedy(action_space: gymnasium.spaces.Space, *, framework: str, initial_epsilon: float = 1.0, final_epsilon: float = 0.05, warmup_timesteps: int = 0, epsilon_timesteps: int = 100000, epsilon_schedule: Schedule | None = None, **kwargs)[source]#

Bases: Exploration

Epsilon-greedy Exploration class that produces exploration actions.

When given a Model’s output and a current epsilon value (based on some Schedule), it produces a random action (if rand(1) < eps) or uses the model-computed one (if rand(1) >= eps).

Methods

__init__

Create an EpsilonGreedy exploration class.

before_compute_actions

Hook for preparations before policy.compute_actions() is called.

get_exploration_optimizer

May add optimizer(s) to the Policy's own optimizers.

on_episode_end

Handles necessary exploration logic at the end of an episode.

on_episode_start

Handles necessary exploration logic at the beginning of an episode.

postprocess_trajectory

Handles post-processing of done episode trajectories.