AlgorithmConfig.environment(env: str | ~typing.Any | gymnasium.Env | None = <ray.rllib.utils.from_config._NotProvided object>, *, env_config: dict | None = <ray.rllib.utils.from_config._NotProvided object>, observation_space: gymnasium.spaces.Space | None = <ray.rllib.utils.from_config._NotProvided object>, action_space: gymnasium.spaces.Space | None = <ray.rllib.utils.from_config._NotProvided object>, env_task_fn: ~typing.Callable[[dict | NestedDict, ~typing.Any | gymnasium.Env, ~ray.rllib.env.env_context.EnvContext], ~typing.Any] | None = <ray.rllib.utils.from_config._NotProvided object>, render_env: bool | None = <ray.rllib.utils.from_config._NotProvided object>, clip_rewards: bool | float | None = <ray.rllib.utils.from_config._NotProvided object>, normalize_actions: bool | None = <ray.rllib.utils.from_config._NotProvided object>, clip_actions: bool | None = <ray.rllib.utils.from_config._NotProvided object>, disable_env_checking: bool | None = <ray.rllib.utils.from_config._NotProvided object>, is_atari: bool | None = <ray.rllib.utils.from_config._NotProvided object>, auto_wrap_old_gym_envs: bool | None = <ray.rllib.utils.from_config._NotProvided object>, action_mask_key: str | None = <ray.rllib.utils.from_config._NotProvided object>) AlgorithmConfig[source]#

Sets the config’s RL-environment settings.

  • env – The environment specifier. This can either be a tune-registered env, via tune.register_env([name], lambda env_ctx: [env object]), or a string specifier of an RLlib supported type. In the latter case, RLlib will try to interpret the specifier as either an Farama-Foundation gymnasium env, a PyBullet env, or a fully qualified classpath to an Env class, e.g. “ray.rllib.examples.envs.classes.random_env.RandomEnv”.

  • env_config – Arguments dict passed to the env creator as an EnvContext object (which is a dict plus the properties: num_rollout_workers, worker_index, vector_index, and remote).

  • observation_space – The observation space for the Policies of this Algorithm.

  • action_space – The action space for the Policies of this Algorithm.

  • env_task_fn – A callable taking the last train results, the base env and the env context as args and returning a new task to set the env to. The env must be a TaskSettableEnv sub-class for this to work. See examples/curriculum_learning.py for an example.

  • render_env – If True, try to render the environment on the local worker or on worker 1 (if num_rollout_workers > 0). For vectorized envs, this usually means that only the first sub-environment will be rendered. In order for this to work, your env will have to implement the render() method which either: a) handles window generation and rendering itself (returning True) or b) returns a numpy uint8 image of shape [height x width x 3 (RGB)].

  • clip_rewards – Whether to clip rewards during Policy’s postprocessing. None (default): Clip for Atari only (r=sign(r)). True: r=sign(r): Fixed rewards -1.0, 1.0, or 0.0. False: Never clip. [float value]: Clip at -value and + value. Tuple[value1, value2]: Clip at value1 and value2.

  • normalize_actions – If True, RLlib will learn entirely inside a normalized action space (0.0 centered with small stddev; only affecting Box components). RLlib will unsquash actions (and clip, just in case) to the bounds of the env’s action space before sending actions back to the env.

  • clip_actions – If True, the RLlib default ModuleToEnv connector will clip actions according to the env’s bounds (before sending them into the env.step() call).

  • disable_env_checking – If True, disable the environment pre-checking module.

  • is_atari – This config can be used to explicitly specify whether the env is an Atari env or not. If not specified, RLlib will try to auto-detect this.

  • auto_wrap_old_gym_envs

    Whether to auto-wrap old gym environments (using

    the pre 0.24 gym APIs, e.g. reset() returning single obs and no info dict). If True, RLlib will automatically wrap the given gym env class with the gym-provided compatibility wrapper (gym.wrappers.EnvCompatibility). If False, RLlib will produce a descriptive error on which steps to perform to upgrade to gymnasium (or to switch this flag to True).

    action_mask_key: If observation is a dictionary, expect the value by

    the key action_mask_key to contain a valid actions mask (numpy.int8 array of zeros and ones). Defaults to “action_mask”.


This updated AlgorithmConfig object.