Note
Ray 2.10.0 introduces the alpha stage of RLlib’s “new API stack”. The Ray Team plans to transition algorithms, example scripts, and documentation to the new code base thereby incrementally replacing the “old API stack” (e.g., ModelV2, Policy, RolloutWorker) throughout the subsequent minor releases leading up to Ray 3.0.
Note, however, that so far only PPO (single- and multi-agent) and SAC (single-agent only) support the “new API stack” and continue to run by default with the old APIs. You can continue to use the existing custom (old stack) classes.
See here for more details on how to use the new API stack.
RLlib: Industry-Grade Reinforcement Learning#
RLlib is an open-source library for reinforcement learning (RL), offering support for production-level, highly distributed RL workloads while maintaining unified and simple APIs for a large variety of industry applications. Whether you would like to train your agents in a multi-agent setup, purely from offline (historic) datasets, or using externally connected simulators, RLlib offers a simple solution for each of your decision making needs.
If you either have your problem coded (in python) as an RL environment or own lots of pre-recorded, historic behavioral data to learn from, you will be up and running in only a few days.
RLlib is already used in production by industry leaders in many different verticals, such as climate control, industrial control, manufacturing and logistics, finance, gaming, automobile, robotics, boat design, and many others.
RLlib in 60 seconds#
It only takes a few steps to get your first RLlib workload up and running on your laptop.
RLlib does not automatically install a deep-learning framework, but supports TensorFlow (both 1.x with static-graph and 2.x with eager mode) as well as PyTorch. Depending on your needs, make sure to install either TensorFlow or PyTorch (or both, as shown below):
Note
For installation on computers running Apple Silicon (such as M1), please follow instructions
here.
To be able to run our Atari examples, you should also install
pip install "gym[atari]" "gym[accept-rom-license]" atari_py
.
This is all you need to start coding against RLlib.
Here is an example of running a PPO Algorithm on the
Taxi domain.
We first create a config
for the algorithm, which sets the right environment, and
defines all training parameters we want.
Next, we build
the algorithm and train
it for a total of 5
iterations.
A training iteration includes parallel sample collection by the environment workers, as well as loss calculation on the collected batch and a model update.
As a last step, we evaluate
the trained Algorithm:
from ray.rllib.algorithms.ppo import PPOConfig
config = ( # 1. Configure the algorithm,
PPOConfig()
.environment("Taxi-v3")
.env_runners(num_env_runners=2)
.framework("torch")
.training(model={"fcnet_hiddens": [64, 64]})
.evaluation(evaluation_num_env_runners=1)
)
algo = config.build() # 2. build the algorithm,
for _ in range(5):
print(algo.train()) # 3. train it,
algo.evaluate() # 4. and evaluate it.
Note that you can use any Farama-Foundation Gymnasium environment as env
.
In rollouts
you can for instance specify the number of parallel workers to collect samples from the environment.
The framework
config lets you choose between “tf2”, “tf” and “torch” for execution.
You can also tweak RLlib’s default model
config,and set up a separate config for evaluation
.
If you want to learn more about the RLlib training API, you can learn more about it here. Also, see here for a simple example on how to write an action inference loop after training.
If you want to get a quick preview of which algorithms and environments RLlib supports, click on the dropdowns below:
Feature Overview#
RLlib Key Concepts
Learn more about the core concepts of RLlib, such as environments, algorithms and policies.
RLlib Algorithms
See the many available RL algorithms of RLlib for model-free and model-based RL, on-policy and off-policy training, multi-agent RL, and more.
RLlib Environments
Get started with environments supported by RLlib, such as Farama foundation’s Gymnasium, Petting Zoo, and many custom formats for vectorized and multi-agent environments.
The following is a summary of RLlib’s most striking features. Click on the images below to see an example script for each of the listed features:
Highly distributed learning: Our RLlib algorithms (such as our “PPO” or “IMPALA”)
allow you to set the num_env_runners
config parameter, such that your workloads can run
on 100s of CPUs/nodes thus parallelizing and speeding up learning.
gym.Envs
into a multi-agent one
via a few simple steps and start training your agents in any of the following fashions:External simulators: Don’t have your simulation running as a gym.Env in python? No problem! RLlib supports an external environment API and comes with a pluggable, off-the-shelve client/ server setup that allows you to run 100s of independent simulators on the “outside” (e.g. a Windows cloud) connecting to a central RLlib Policy-Server that learns and serves actions. Alternatively, actions can be computed on the client side to save on network traffic.
Offline RL and imitation learning/behavior cloning: You don’t have a simulator for your particular problem, but tons of historic data recorded by a legacy (maybe non-RL/ML) system? This branch of reinforcement learning is for you! RLlib’s comes with several offline RL algorithms (CQL, MARWIL, and DQfD), allowing you to either purely behavior-clone your existing system or learn how to further improve over it.
Customizing RLlib#
RLlib provides simple APIs to customize all aspects of your training- and experimental workflows. For example, you may code your own environments in python using Farama-Foundation’s gymnasium or DeepMind’s OpenSpiel, provide custom TensorFlow/Keras- or , Torch models, write your own policy- and loss definitions, or define custom exploratory behavior.
Via mapping one or more agents in your environments to (one or more) policies, multi-agent RL (MARL) becomes an easy-to-use low-level primitive for our users.