ray.rllib.callbacks.callbacks.RLlibCallback.on_episode_created#
- RLlibCallback.on_episode_created(*, episode: SingleAgentEpisode | MultiAgentEpisode | EpisodeV2, worker: EnvRunner | None = None, env_runner: EnvRunner | None = None, metrics_logger: MetricsLogger | None = None, base_env: BaseEnv | None = None, env: gymnasium.Env | None = None, policies: Dict[str, Policy] | None = None, rl_module: RLModule | None = None, env_index: int, **kwargs) None [source]#
Callback run when a new episode is created (but has not started yet!).
This method gets called after a new Episode(V2) (old stack) or MultiAgentEpisode instance has been created. This happens before the respective sub-environment’s (usually a gym.Env)
reset()
is called by RLlib.Note, at the moment this callback does not get called in the new API stack and single-agent mode.
Episode(V2)/MultiAgentEpisode created: This callback is called.
Respective sub-environment (gym.Env) is
reset()
.Callback
on_episode_start
is called.Stepping through sub-environment/episode commences.
- Parameters:
episode – The newly created episode. On the new API stack, this will be a MultiAgentEpisode object. On the old API stack, this will be a Episode or EpisodeV2 object. This is the episode that is about to be started with an upcoming
env.reset()
. Only after this reset call, theon_episode_start
callback will be called.env_runner – Replaces
worker
arg. Reference to the current EnvRunner.metrics_logger – The MetricsLogger object inside the
env_runner
. Can be used to log custom metrics after Episode creation.env – Replaces
base_env
arg. The gym.Env (new API stack) or RLlib BaseEnv (old API stack) running the episode. On the old stack, the underlying sub environment objects can be retrieved by callingbase_env.get_sub_environments()
.rl_module – Replaces
policies
arg. Either the RLModule (new API stack) or a dict mapping policy IDs to policy objects (old stack). In single agent mode there will only be a single policy/RLModule under therl_module["default_policy"]
key.env_index – The index of the sub-environment that is about to be reset (within the vector of sub-environments of the BaseEnv).
kwargs – Forward compatibility placeholder.