ray.rllib.utils.exploration.random_encoder.RE3#

class ray.rllib.utils.exploration.random_encoder.RE3(action_space: gymnasium.spaces.Space, *, framework: str, model: ModelV2, embeds_dim: int = 128, encoder_net_config: dict | None = None, beta: float = 0.2, beta_schedule: str = 'constant', rho: float = 0.1, k_nn: int = 50, random_timesteps: int = 10000, sub_exploration: Dict[str, Any] | type | str | None = None, **kwargs)[source]#

Bases: Exploration

Random Encoder for Efficient Exploration.

Implementation of: [1] State entropy maximization with random encoders for efficient exploration. Seo, Chen, Shin, Lee, Abbeel, & Lee, (2021). arXiv preprint arXiv:2102.09430.

Estimates state entropy using a particle-based k-nearest neighbors (k-NN) estimator in the latent space. The state’s latent representation is calculated using an encoder with randomly initialized parameters.

The entropy of a state is considered as intrinsic reward and added to the environment’s extrinsic reward for policy optimization. Entropy is calculated per batch, it does not take the distribution of the entire replay buffer into consideration.

Methods

__init__

Initialize RE3.

before_compute_actions

Hook for preparations before policy.compute_actions() is called.

get_exploration_optimizer

May add optimizer(s) to the Policy's own optimizers.

get_state

Returns the current exploration state.

on_episode_end

Handles necessary exploration logic at the end of an episode.

on_episode_start

Handles necessary exploration logic at the beginning of an episode.

postprocess_trajectory

Calculate states' latent representations/embeddings.

set_state

Sets the Exploration object's state to the given values.