ray.rllib.utils.exploration.random_encoder.RE3.__init__#
- RE3.__init__(action_space: gymnasium.spaces.Space, *, framework: str, model: ModelV2, embeds_dim: int = 128, encoder_net_config: dict | None = None, beta: float = 0.2, beta_schedule: str = 'constant', rho: float = 0.1, k_nn: int = 50, random_timesteps: int = 10000, sub_exploration: Dict[str, Any] | type | str | None = None, **kwargs)[source]#
Initialize RE3.
- Parameters:
action_space – The action space in which to explore.
framework – Supports “tf”, this implementation does not support torch.
model – The policy’s model.
embeds_dim – The dimensionality of the observation embedding vectors in latent space.
encoder_net_config – Optional model configuration for the encoder network, producing embedding vectors from observations. This can be used to configure fcnet- or conv_net setups to properly process any observation space.
beta – Hyperparameter to choose between exploration and exploitation.
beta_schedule – Schedule to use for beta decay, one of “constant” or “linear_decay”.
rho – Beta decay factor, used for on-policy algorithm.
k_nn – Number of neighbours to set for K-NN entropy estimation.
random_timesteps – The number of timesteps to act completely randomly (see [1]).
sub_exploration – The config dict for the underlying Exploration to use (e.g. epsilon-greedy for DQN). If None, uses the FromSpecDict provided in the Policy’s default config.
- Raises:
ValueError – If the input framework is Torch.