SingleAgentEnvRunner API#

rllib.env.single_agent_env_runner.SingleAgentEnvRunner#

class ray.rllib.env.single_agent_env_runner.SingleAgentEnvRunner(*, config: AlgorithmConfig, **kwargs)[source]#

The generic environment runner for the single agent case.

PublicAPI (alpha): This API is in alpha and may change before becoming stable.

__init__(*, config: AlgorithmConfig, **kwargs)[source]#

Initializes a SingleAgentEnvRunner instance.

Parameters:: config – An AlgorithmConfig object containing all settings needed to build this EnvRunner class.

sample(*, num_timesteps: int | None = None, num_episodes: int | None = None, explore: bool | None = None, random_actions: bool = False, force_reset: bool = False) → List[SingleAgentEpisode][source]#

Runs and returns a sample (n timesteps or m episodes) on the env(s).

If neither num_timesteps nor num_episodes are provided and the config batch_mode is “truncate_episodes” then config.get_rollout_fragment_length(self.worker_index) * self.num_envs timesteps will be sampled.

Parameters:

num_timesteps – The minimum number of timesteps to sample during this call. The episodes returned will contain the total timesteps greater than or equal to num_timesteps and less than num_timesteps + num_envs_per_env_runner. Note that only one of num_timesteps or num_episodes may be provided. Since we sample from envs in parallel, the number of returned timesteps will be between num_timesteps and num_timesteps + num_envs_per_env_runner - 1.
num_episodes – The minimum number of episodes to sample during this call. Note that only one of num_timesteps or num_episodes may be provided. Since we sample from envs in parallel, the number of returned episodes will be between num_episodes and num_episodes + num_envs_per_env_runner - 1.
explore – If True, will use the RLModule’s forward_exploration() method to compute actions. If False, will use the RLModule’s forward_inference() method. If None (default), will use the explore boolean setting from self.config passed into this EnvRunner’s constructor. You can change this setting in your config via config.env_runners(explore=True|False).
random_actions – If True, actions will be sampled randomly (from the action space of the environment). If False (default), actions or action distribution parameters are computed by the RLModule.
force_reset – Whether to force-reset all vectorized environments before sampling. Useful if you would like to collect a clean slate of new episodes via this call. Note that when sampling n episodes (num_episodes != None), this is fixed to True.

Returns:

A list of SingleAgentEpisode instances, carrying the sampled data.

get_metrics() → Dict[source]#

Returns metrics (in any form) of the thus far collected, completed episodes.

Returns:: Metrics of any form.

get_spaces()[source]#: Returns a dict mapping ModuleIDs to 2-tuples of obs- and action space.

make_env() → None[source]#

Creates a vectorized gymnasium env and stores it in self.env.

Note that users can change the EnvRunner’s config (e.g. change self.config.env_config) and then call this method to create new environments with the updated configuration.

make_module()[source]#

Creates the RLModule for this EnvRunner and assigns it to self.module.

Note that users should be able to change the EnvRunner’s config (e.g. change self.config.rl_module_spec) and then call this method to create a new RLModule with the updated configuration.