Note
Ray 2.10.0 introduces the alpha stage of RLlib’s “new API stack”. The team is currently transitioning algorithms, example scripts, and documentation to the new code base throughout the subsequent minor releases leading up to Ray 3.0.
See here for more details on how to activate and use the new API stack.
Environments#
Any environment type provided by you to RLlib (e.g. a user-defined gym.Env class),
is converted internally into the BaseEnv
API, whose main methods are poll()
and send_actions()
:
The BaseEnv
API allows RLlib to support:
Vectorization of sub-environments (i.e. individual gym.Env instances, stacked to form a vector of envs) in order to batch the action computing model forward passes.
External simulators requiring async execution (e.g. envs that run on separate machines and independently request actions from a policy server).
Stepping through the individual sub-environments in parallel via pre-converting them into separate
@ray.remote
actors.Multi-agent RL via dicts mapping agent IDs to observations/rewards/etc..
For example, if you provide a custom gym.Env class to RLlib, auto-conversion to BaseEnv
goes as follows:
User provides a gym.Env ->
_VectorizedGymEnv
(is-aVectorEnv
) ->BaseEnv
Here is a simple example:
# __rllib-custom-gym-env-begin__
import gymnasium as gym
import numpy as np
import ray
from ray.rllib.algorithms.ppo import PPOConfig
class SimpleCorridor(gym.Env):
def __init__(self, config):
self.end_pos = config["corridor_length"]
self.cur_pos = 0.0
self.action_space = gym.spaces.Discrete(2) # right/left
self.observation_space = gym.spaces.Box(0.0, self.end_pos, shape=(1,))
def reset(self, *, seed=None, options=None):
self.cur_pos = 0.0
return np.array([self.cur_pos]), {}
def step(self, action):
if action == 0 and self.cur_pos > 0.0: # move right (towards goal)
self.cur_pos -= 1.0
elif action == 1: # move left (towards start)
self.cur_pos += 1.0
if self.cur_pos >= self.end_pos:
return np.array([0.0]), 1.0, True, True, {}
else:
return np.array([self.cur_pos]), -0.1, False, False, {}
ray.init()
config = PPOConfig().environment(SimpleCorridor, env_config={"corridor_length": 5})
algo = config.build()
for _ in range(3):
print(algo.train())
algo.stop()
# __rllib-custom-gym-env-end__
However, you may also conveniently sub-class any of the other supported RLlib-specific
environment types. The automated paths from those env types (or callables returning instances of those types) to
an RLlib BaseEnv
is as follows:
User provides a custom
MultiAgentEnv
(is-a gym.Env) ->VectorEnv
->BaseEnv
User uses a policy client (via an external simulator) ->
ExternalEnv
|ExternalMultiAgentEnv
->BaseEnv
User provides a custom
BaseEnv
-> do nothing