BaseEnv API

rllib.env.base_env.BaseEnv

class ray.rllib.env.base_env.BaseEnv[source]

The lowest-level env interface used by RLlib for sampling.

BaseEnv models multiple agents executing asynchronously in multiple vectorized sub-environments. A call to poll() returns observations from ready agents keyed by their sub-environment ID and agent IDs, and actions for those agents can be sent back via send_actions().

All other RLlib supported env types can be converted to BaseEnv. RLlib handles these conversions internally in RolloutWorker, for example:

gym.Env => rllib.VectorEnv => rllib.BaseEnv rllib.MultiAgentEnv (is-a gym.Env) => rllib.VectorEnv => rllib.BaseEnv rllib.ExternalEnv => rllib.BaseEnv

action_space

Action space. This must be defined for single-agent envs. Multi-agent envs can set this to None.

Type

gym.Space

observation_space

Observation space. This must be defined for single-agent envs. Multi-agent envs can set this to None.

Type

gym.Space

Examples

>>> env = MyBaseEnv()
>>> obs, rewards, dones, infos, off_policy_actions = env.poll()
>>> print(obs)
{
    "env_0": {
        "car_0": [2.4, 1.6],
        "car_1": [3.4, -3.2],
    },
    "env_1": {
        "car_0": [8.0, 4.1],
    },
    "env_2": {
        "car_0": [2.3, 3.3],
        "car_1": [1.4, -0.2],
        "car_3": [1.2, 0.1],
    },
}
>>> env.send_actions({
...   "env_0": {
...     "car_0": 0,
...     "car_1": 1,
...   }, ...
... })
>>> obs, rewards, dones, infos, off_policy_actions = env.poll()
>>> print(obs)
{
    "env_0": {
        "car_0": [4.1, 1.7],
        "car_1": [3.2, -4.2],
    }, ...
}
>>> print(dones)
{
    "env_0": {
        "__all__": False,
        "car_0": False,
        "car_1": True,
    }, ...
}
to_base_env(make_env: Callable[[int], Any] = None, num_envs: int = 1, remote_envs: bool = False, remote_env_batch_wait_ms: int = 0)ray.rllib.env.base_env.BaseEnv[source]

Converts an RLlib-supported env into a BaseEnv object.

Supported types for the env arg are gym.Env, BaseEnv, VectorEnv, MultiAgentEnv, ExternalEnv, or ExternalMultiAgentEnv.

The resulting BaseEnv is always vectorized (contains n sub-environments) to support batched forward passes, where n may also be 1. BaseEnv also supports async execution via the poll and send_actions methods and thus supports external simulators.

TODO: Support gym3 environments, which are already vectorized.

Parameters
  • env – An already existing environment of any supported env type to convert/wrap into a BaseEnv. Supported types are gym.Env, BaseEnv, VectorEnv, MultiAgentEnv, ExternalEnv, and ExternalMultiAgentEnv.

  • make_env – A callable taking an int as input (which indicates the number of individual sub-environments within the final vectorized BaseEnv) and returning one individual sub-environment.

  • num_envs – The number of sub-environments to create in the resulting (vectorized) BaseEnv. The already existing env will be one of the num_envs.

  • remote_envs – Whether each sub-env should be a @ray.remote actor. You can set this behavior in your config via the remote_worker_envs=True option.

  • remote_env_batch_wait_ms – The wait time (in ms) to poll remote sub-environments for, if applicable. Only used if remote_envs is True.

  • policy_config – Optional policy config dict.

Returns

The resulting BaseEnv object.

poll() → Tuple[Dict[Union[int, str], Dict[Any, Any]], Dict[Union[int, str], Dict[Any, Any]], Dict[Union[int, str], Dict[Any, Any]], Dict[Union[int, str], Dict[Any, Any]], Dict[Union[int, str], Dict[Any, Any]]][source]

Returns observations from ready agents.

All return values are two-level dicts mapping from EnvID to dicts mapping from AgentIDs to (observation/reward/etc..) values. The number of agents and sub-environments may vary over time.

Returns

Tuple consisting of 1) New observations for each ready agent. 2) Reward values for each ready agent. If the episode is just started, the value will be None. 3) Done values for each ready agent. The special key “__all__” is used to indicate env termination. 4) Info values for each ready agent. 5) Agents may take off-policy actions. When that happens, there will be an entry in this dict that contains the taken action. There is no need to send_actions() for agents that have already chosen off-policy actions.

send_actions(action_dict: Dict[Union[int, str], Dict[Any, Any]]) → None[source]

Called to send actions back to running agents in this env.

Actions should be sent for each ready agent that returned observations in the previous poll() call.

Parameters

action_dict – Actions values keyed by env_id and agent_id.

try_reset(env_id: Union[int, str, None] = None) → Optional[Dict[Any, Any]][source]

Attempt to reset the sub-env with the given id or all sub-envs.

If the environment does not support synchronous reset, None can be returned here.

Parameters

env_id – The sub-environment’s ID if applicable. If None, reset the entire Env (i.e. all sub-environments).

Returns

The reset (multi-agent) observation dict. None if reset is not supported.

get_sub_environments() → List[Any][source]

Return a reference to the underlying sub environments, if any.

Returns

List of the underlying sub environments or [].

try_render(env_id: Union[int, str, None] = None) → None[source]

Tries to render the sub-environment with the given id or all.

Parameters

env_id – The sub-environment’s ID, if applicable. If None, renders the entire Env (i.e. all sub-environments).

stop() → None[source]

Releases all resources used.