ray.rllib.utils.exploration.curiosity.Curiosity#

class ray.rllib.utils.exploration.curiosity.Curiosity(action_space: gymnasium.spaces.Space, *, framework: str, model: ModelV2, feature_dim: int = 288, feature_net_config: dict | None = None, inverse_net_hiddens: Tuple[int] = (256,), inverse_net_activation: str = 'relu', forward_net_hiddens: Tuple[int] = (256,), forward_net_activation: str = 'relu', beta: float = 0.2, eta: float = 1.0, lr: float = 0.001, sub_exploration: Dict[str, Any] | type | str | None = None, **kwargs)[source]#

Bases: Exploration

Implementation of: [1] Curiosity-driven Exploration by Self-supervised Prediction Pathak, Agrawal, Efros, and Darrell - UC Berkeley - ICML 2017. https://arxiv.org/pdf/1705.05363.pdf

Learns a simplified model of the environment based on three networks: 1) Embedding observations into latent space (“feature” network). 2) Predicting the action, given two consecutive embedded observations (“inverse” network). 3) Predicting the next embedded obs, given an obs and action (“forward” network).

The less the agent is able to predict the actually observed next feature vector, given obs and action (through the forwards network), the larger the “intrinsic reward”, which will be added to the extrinsic reward. Therefore, if a state transition was unexpected, the agent becomes “curious” and will further explore this transition leading to better exploration in sparse rewards environments.

Methods

__init__

Initializes a Curiosity object.

before_compute_actions

Hook for preparations before policy.compute_actions() is called.

get_state

Returns the current exploration state.

on_episode_end

Handles necessary exploration logic at the end of an episode.

on_episode_start

Handles necessary exploration logic at the beginning of an episode.

postprocess_trajectory

Calculates phi values (obs, obs', and predicted obs') and ri.

set_state

Sets the Exploration object's state to the given values.