Environments

Any environment type provided by you to RLlib (e.g. a user-defined gym.Env class), is converted internally into the BaseEnv API, whose main methods are poll() and send_actions():

../../_images/env_classes_overview.svg

The BaseEnv API allows RLlib to support:

  1. Vectorization of sub-environments (i.e. individual gym.Env instances, stacked to form a vector of envs) in order to batch the action computing model forward passes.

  2. External simulators requiring async execution (e.g. envs that run on separate machines and independently request actions from a policy server).

  3. Stepping through the individual sub-environments in parallel via pre-converting them into separate @ray.remote actors.

  4. Multi-agent RL via dicts mapping agent IDs to observations/rewards/etc..

For example, if you provide a custom gym.Env class to RLlib, auto-conversion to BaseEnv goes as follows:

Here is a simple example:

# __rllib-custom-gym-env-begin__
import gym

import ray
from ray.rllib.agents import ppo


class SimpleCorridor(gym.Env):
    def __init__(self, config):
        self.end_pos = config["corridor_length"]
        self.cur_pos = 0
        self.action_space = gym.spaces.Discrete(2)  # right/left
        self.observation_space = gym.spaces.Discrete(self.end_pos)

    def reset(self):
        self.cur_pos = 0
        return self.cur_pos

    def step(self, action):
        if action == 0 and self.cur_pos > 0:  # move right (towards goal)
            self.cur_pos -= 1
        elif action == 1:  # move left (towards start)
            self.cur_pos += 1
        if self.cur_pos >= self.end_pos:
            return 0, 1.0, True, {}
        else:
            return self.cur_pos, -0.1, False, {}


ray.init()
config = {
    "env": SimpleCorridor,
    "env_config": {
        "corridor_length": 5,
    },
}

trainer = ppo.PPOTrainer(config=config)
for _ in range(3):
    print(trainer.train())
# __rllib-custom-gym-env-end__

However, you may also conveniently sub-class any of the other supported RLlib-specific environment types. The automated paths from those env types (or callables returning instances of those types) to an RLlib BaseEnv is as follows: