ray.rllib.algorithms.algorithm_config.AlgorithmConfig.environment#
- AlgorithmConfig.environment(env: str | ~typing.Any | gymnasium.Env | None = <ray.rllib.utils.from_config._NotProvided object>, *, env_config: dict | None = <ray.rllib.utils.from_config._NotProvided object>, observation_space: gymnasium.spaces.Space | None = <ray.rllib.utils.from_config._NotProvided object>, action_space: gymnasium.spaces.Space | None = <ray.rllib.utils.from_config._NotProvided object>, env_task_fn: ~typing.Callable[[~typing.Dict, ~typing.Any | gymnasium.Env, ~ray.rllib.env.env_context.EnvContext], ~typing.Any] | None = <ray.rllib.utils.from_config._NotProvided object>, render_env: bool | None = <ray.rllib.utils.from_config._NotProvided object>, clip_rewards: bool | float | None = <ray.rllib.utils.from_config._NotProvided object>, normalize_actions: bool | None = <ray.rllib.utils.from_config._NotProvided object>, clip_actions: bool | None = <ray.rllib.utils.from_config._NotProvided object>, disable_env_checking: bool | None = <ray.rllib.utils.from_config._NotProvided object>, is_atari: bool | None = <ray.rllib.utils.from_config._NotProvided object>, action_mask_key: str | None = <ray.rllib.utils.from_config._NotProvided object>, auto_wrap_old_gym_envs=-1) AlgorithmConfig [source]#
Sets the config’s RL-environment settings.
- Parameters:
env – The environment specifier. This can either be a tune-registered env, via
tune.register_env([name], lambda env_ctx: [env object])
, or a string specifier of an RLlib supported type. In the latter case, RLlib tries to interpret the specifier as either an Farama-Foundation gymnasium env, a PyBullet env, or a fully qualified classpath to an Env class, e.g. “ray.rllib.examples.envs.classes.random_env.RandomEnv”.env_config – Arguments dict passed to the env creator as an EnvContext object (which is a dict plus the properties:
num_env_runners
,worker_index
,vector_index
, andremote
).observation_space – The observation space for the Policies of this Algorithm.
action_space – The action space for the Policies of this Algorithm.
env_task_fn – A callable taking the last train results, the base env and the env context as args and returning a new task to set the env to. The env must be a
TaskSettableEnv
sub-class for this to work. Seeexamples/curriculum_learning.py
for an example.render_env – If True, try to render the environment on the local worker or on worker 1 (if num_env_runners > 0). For vectorized envs, this usually means that only the first sub-environment is rendered. In order for this to work, your env has to implement the
render()
method which either: a) handles window generation and rendering itself (returning True) or b) returns a numpy uint8 image of shape [height x width x 3 (RGB)].clip_rewards – Whether to clip rewards during Policy’s postprocessing. None (default): Clip for Atari only (r=sign(r)). True: r=sign(r): Fixed rewards -1.0, 1.0, or 0.0. False: Never clip. [float value]: Clip at -value and + value. Tuple[value1, value2]: Clip at value1 and value2.
normalize_actions – If True, RLlib learns entirely inside a normalized action space (0.0 centered with small stddev; only affecting Box components). RLlib unsquashes actions (and clip, just in case) to the bounds of the env’s action space before sending actions back to the env.
clip_actions – If True, the RLlib default ModuleToEnv connector clips actions according to the env’s bounds (before sending them into the
env.step()
call).disable_env_checking – Disable RLlib’s env checks after a gymnasium.Env instance has been constructed in an EnvRunner. Note that the checks include an
env.reset()
andenv.step()
(with a random action), which might tinker with your env’s logic and behavior and thus negatively influence sample collection- and/or learning behavior.is_atari – This config can be used to explicitly specify whether the env is an Atari env or not. If not specified, RLlib tries to auto-detect this.
action_mask_key – If observation is a dictionary, expect the value by the key
action_mask_key
to contain a valid actions mask (numpy.int8
array of zeros and ones). Defaults to “action_mask”.
- Returns:
This updated AlgorithmConfig object.