ray.rllib.utils.exploration.curiosity.Curiosity#
- class ray.rllib.utils.exploration.curiosity.Curiosity(action_space: gymnasium.spaces.Space, *, framework: str, model: ModelV2, feature_dim: int = 288, feature_net_config: dict | None = None, inverse_net_hiddens: Tuple[int] = (256,), inverse_net_activation: str = 'relu', forward_net_hiddens: Tuple[int] = (256,), forward_net_activation: str = 'relu', beta: float = 0.2, eta: float = 1.0, lr: float = 0.001, sub_exploration: Dict[str, Any] | type | str | None = None, **kwargs)[source]#
Bases:
Exploration
Implementation of: [1] Curiosity-driven Exploration by Self-supervised Prediction Pathak, Agrawal, Efros, and Darrell - UC Berkeley - ICML 2017. https://arxiv.org/pdf/1705.05363.pdf
Learns a simplified model of the environment based on three networks: 1) Embedding observations into latent space (“feature” network). 2) Predicting the action, given two consecutive embedded observations (“inverse” network). 3) Predicting the next embedded obs, given an obs and action (“forward” network).
The less the agent is able to predict the actually observed next feature vector, given obs and action (through the forwards network), the larger the “intrinsic reward”, which will be added to the extrinsic reward. Therefore, if a state transition was unexpected, the agent becomes “curious” and will further explore this transition leading to better exploration in sparse rewards environments.
Methods
Initializes a Curiosity object.
Hook for preparations before policy.compute_actions() is called.
Returns the current exploration state.
Handles necessary exploration logic at the end of an episode.
Handles necessary exploration logic at the beginning of an episode.
Calculates phi values (obs, obs', and predicted obs') and ri.
Sets the Exploration object's state to the given values.