ray.rllib.callbacks.callbacks.RLlibCallback.on_episode_created#

RLlibCallback.on_episode_created(*, episode: SingleAgentEpisode | MultiAgentEpisode | EpisodeV2, worker: EnvRunner | None = None, env_runner: EnvRunner | None = None, metrics_logger: MetricsLogger | None = None, base_env: BaseEnv | None = None, env: gymnasium.Env | None = None, policies: Dict[str, Policy] | None = None, rl_module: RLModule | None = None, env_index: int, **kwargs) None[source]#

Callback run when a new episode is created (but has not started yet!).

This method gets called after a new Episode(V2) (old stack) or MultiAgentEpisode instance has been created. This happens before the respective sub-environment’s (usually a gym.Env) reset() is called by RLlib.

Note, at the moment this callback does not get called in the new API stack and single-agent mode.

  1. Episode(V2)/MultiAgentEpisode created: This callback is called.

  2. Respective sub-environment (gym.Env) is reset().

  3. Callback on_episode_start is called.

  4. Stepping through sub-environment/episode commences.

Parameters:
  • episode – The newly created episode. On the new API stack, this will be a MultiAgentEpisode object. On the old API stack, this will be a Episode or EpisodeV2 object. This is the episode that is about to be started with an upcoming env.reset(). Only after this reset call, the on_episode_start callback will be called.

  • env_runner – Replaces worker arg. Reference to the current EnvRunner.

  • metrics_logger – The MetricsLogger object inside the env_runner. Can be used to log custom metrics after Episode creation.

  • env – Replaces base_env arg. The gym.Env (new API stack) or RLlib BaseEnv (old API stack) running the episode. On the old stack, the underlying sub environment objects can be retrieved by calling base_env.get_sub_environments().

  • rl_module – Replaces policies arg. Either the RLModule (new API stack) or a dict mapping policy IDs to policy objects (old stack). In single agent mode there will only be a single policy/RLModule under the rl_module["default_policy"] key.

  • env_index – The index of the sub-environment that is about to be reset (within the vector of sub-environments of the BaseEnv).

  • kwargs – Forward compatibility placeholder.