ray.rllib.utils.exploration.random_encoder.RE3#

class ray.rllib.utils.exploration.random_encoder.RE3(action_space: gymnasium.spaces.Space, *, framework: str, model: ModelV2, embeds_dim: int = 128, encoder_net_config: dict | None = None, beta: float = 0.2, beta_schedule: str = 'constant', rho: float = 0.1, k_nn: int = 50, random_timesteps: int = 10000, sub_exploration: Dict[str, Any] | type | str | None = None, **kwargs)[source]#

Bases: Exploration

Random Encoder for Efficient Exploration.

Implementation of: [1] State entropy maximization with random encoders for efficient exploration. Seo, Chen, Shin, Lee, Abbeel, & Lee, (2021). arXiv preprint arXiv:2102.09430.

Estimates state entropy using a particle-based k-nearest neighbors (k-NN) estimator in the latent space. The state’s latent representation is calculated using an encoder with randomly initialized parameters.

The entropy of a state is considered as intrinsic reward and added to the environment’s extrinsic reward for policy optimization. Entropy is calculated per batch, it does not take the distribution of the entire replay buffer into consideration.

Methods

`__init__`	Initialize RE3.
`before_compute_actions`	Hook for preparations before policy.compute_actions() is called.
`get_exploration_optimizer`	May add optimizer(s) to the Policy's own `optimizers`.
`get_state`	Returns the current exploration state.
`on_episode_end`	Handles necessary exploration logic at the end of an episode.
`on_episode_start`	Handles necessary exploration logic at the beginning of an episode.
`postprocess_trajectory`	Calculate states' latent representations/embeddings.
`set_state`	Sets the Exploration object's state to the given values.