ray.rllib.utils.exploration.random_encoder.RE3.__init__#

RE3.__init__(action_space: gymnasium.spaces.Space, *, framework: str, model: ModelV2, embeds_dim: int = 128, encoder_net_config: dict | None = None, beta: float = 0.2, beta_schedule: str = 'constant', rho: float = 0.1, k_nn: int = 50, random_timesteps: int = 10000, sub_exploration: Dict[str, Any] | type | str | None = None, **kwargs)[source]#

Initialize RE3.

Parameters:
  • action_space – The action space in which to explore.

  • framework – Supports “tf”, this implementation does not support torch.

  • model – The policy’s model.

  • embeds_dim – The dimensionality of the observation embedding vectors in latent space.

  • encoder_net_config – Optional model configuration for the encoder network, producing embedding vectors from observations. This can be used to configure fcnet- or conv_net setups to properly process any observation space.

  • beta – Hyperparameter to choose between exploration and exploitation.

  • beta_schedule – Schedule to use for beta decay, one of “constant” or “linear_decay”.

  • rho – Beta decay factor, used for on-policy algorithm.

  • k_nn – Number of neighbours to set for K-NN entropy estimation.

  • random_timesteps – The number of timesteps to act completely randomly (see [1]).

  • sub_exploration – The config dict for the underlying Exploration to use (e.g. epsilon-greedy for DQN). If None, uses the FromSpecDict provided in the Policy’s default config.

Raises:

ValueError – If the input framework is Torch.