ray.rllib.utils.exploration.curiosity.Curiosity.__init__#

Curiosity.__init__(action_space: gymnasium.spaces.Space, *, framework: str, model: ModelV2, feature_dim: int = 288, feature_net_config: dict | None = None, inverse_net_hiddens: Tuple[int] = (256,), inverse_net_activation: str = 'relu', forward_net_hiddens: Tuple[int] = (256,), forward_net_activation: str = 'relu', beta: float = 0.2, eta: float = 1.0, lr: float = 0.001, sub_exploration: Dict[str, Any] | type | str | None = None, **kwargs)[source]#

Initializes a Curiosity object.

Uses as defaults the hyperparameters described in [1].

Parameters:
  • feature_dim – The dimensionality of the feature (phi) vectors.

  • feature_net_config – Optional model configuration for the feature network, producing feature vectors (phi) from observations. This can be used to configure fcnet- or conv_net setups to properly process any observation space.

  • inverse_net_hiddens – Tuple of the layer sizes of the inverse (action predicting) NN head (on top of the feature outputs for phi and phi’).

  • inverse_net_activation – Activation specifier for the inverse net.

  • forward_net_hiddens – Tuple of the layer sizes of the forward (phi’ predicting) NN head.

  • forward_net_activation – Activation specifier for the forward net.

  • beta – Weight for the forward loss (over the inverse loss, which gets weight=1.0-beta) in the common loss term.

  • eta – Weight for intrinsic rewards before being added to extrinsic ones.

  • lr – The learning rate for the curiosity-specific optimizer, optimizing feature-, inverse-, and forward nets.

  • sub_exploration – The config dict for the underlying Exploration to use (e.g. epsilon-greedy for DQN). If None, uses the FromSpecDict provided in the Policy’s default config.