RLlib: Industry-Grade Reinforcement Learning


RLlib is an open-source library for reinforcement learning (RL), offering support for production-level, highly distributed RL workloads while maintaining unified and simple APIs for a large variety of industry applications. Whether you would like to train your agents in a multi-agent setup, purely from offline (historic) datasets, or using externally connected simulators, RLlib offers a simple solution for each of your decision making needs.

If you either have your problem coded (in python) as an RL environment or own lots of pre-recorded, historic behavioral data to learn from, you will be up and running in only a few days.

RLlib is already used in production by industry leaders in many different verticals, such as climate control, industrial control, manufacturing and logistics, finance, gaming, automobile, robotics, boat design, and many others.

RLlib in 60 seconds


It only takes a few steps to get your first RLlib workload up and running on your laptop:

TensorFlow or PyTorch:

RLlib does not automatically install a deep-learning framework, but supports TensorFlow (both 1.x with static-graph and 2.x with eager mode) as well as PyTorch. Depending on your needs, make sure to install either TensorFlow or PyTorch (or both as shown below):

$ conda create -n rllib python=3.8
$ conda activate rllib
$ pip install "ray[rllib]" tensorflow torch

Note, for installation on computers running Apple Silicon (such as M1), please follow instructions here

To be able to run our Atari examples, you should also install:

$ pip install "gym[atari]" "gym[accept-rom-license]" atari_py

After these quick pip installs, you can start coding against RLlib.

Here is an example of running a PPO Trainer on the Taxi domain for a few training iterations, then perform a single evaluation loop (with rendering enabled):

# Import the RL algorithm (Algorithm) we would like to use.
from ray.rllib.algorithms.ppo import PPO

# Configure the algorithm.
config = {
    # Environment (RLlib understands openAI gym registered strings).
    "env": "Taxi-v3",
    # Use 2 environment workers (aka "rollout workers") that parallelly
    # collect samples from their own environment clone(s).
    "num_workers": 2,
    # Change this to "framework: torch", if you are using PyTorch.
    # Also, use "framework: tf2" for tf2.x eager execution.
    "framework": "tf",
    # Tweak the default model provided automatically by RLlib,
    # given the environment's observation- and action spaces.
    "model": {
        "fcnet_hiddens": [64, 64],
        "fcnet_activation": "relu",
    # Set up a separate evaluation worker set for the
    # `algo.evaluate()` call after training (see below).
    "evaluation_num_workers": 1,
    # Only for evaluation runs, render the env.
    "evaluation_config": {
        "render_env": True,

# Create our RLlib Trainer.
algo = PPO(config=config)

# Run it for n training iterations. A training iteration includes
# parallel sample collection by the environment workers as well as
# loss calculation on the collected batch and a model update.
for _ in range(3):

# Evaluate the trained Trainer (and render each timestep to the shell's
# output).

See here for a simple example on how to write an action inference loop after training.

Feature Overview

You can read about:

The following is a summary of RLlib’s most striking features. Click on the images below to see an example script for each of the listed features:


The most popular deep-learning frameworks: PyTorch and TensorFlow (tf1.x/2.x static-graph/eager/traced).


Highly distributed learning: Our RLlib algorithms (such as our “PPO” or “IMPALA”) allow you to set the num_workers config parameter, such that your workloads can run on 100s of CPUs/nodes thus parallelizing and speeding up learning.


Vectorized (batched) and remote (parallel) environments: RLlib auto-vectorizes your gym.Envs via the num_envs_per_worker config. Environment workers can then batch and thus significantly speedup the action computing forward pass. On top of that, RLlib offers the remote_worker_envs config to create single environments (within a vectorized one) as ray Actors, thus parallelizing even the env stepping process.

Multi-agent RL (MARL): Convert your (custom) gym.Envs into a multi-agent one via a few simple steps and start training your agents in any of the following fashions:
1) Cooperative with shared or separate policies and/or value functions.
2) Adversarial scenarios using self-play and league-based training.
3) Independent learning of neutral/co-existing agents.

External simulators: Don’t have your simulation running as a gym.Env in python? No problem! RLlib supports an external environment API and comes with a pluggable, off-the-shelve client/ server setup that allows you to run 100s of independent simulators on the “outside” (e.g. a Windows cloud) connecting to a central RLlib Policy-Server that learns and serves actions. Alternatively, actions can be computed on the client side to save on network traffic.


Offline RL and imitation learning/behavior cloning: You don’t have a simulator for your particular problem, but tons of historic data recorded by a legacy (maybe non-RL/ML) system? This branch of reinforcement learning is for you! RLlib’s comes with several offline RL algorithms (CQL, MARWIL, and DQfD), allowing you to either purely behavior-clone your existing system or learn how to further improve over it.

If you want to get a quick preview of which algorithms and environments RLlib supports, click on the dropdowns below:

Customizing RLlib

RLlib provides simple APIs to customize all aspects of your training- and experimental workflows. For example, you may code your own environments in python using openAI’s gym or DeepMind’s OpenSpiel, provide custom TensorFlow/Keras- or , Torch models, write your own policy- and loss definitions, or define custom exploratory behavior.

Via mapping one or more agents in your environments to (one or more) policies, multi-agent RL (MARL) becomes an easy-to-use low-level primitive for our users.


RLlib’s API stack: Built on top of Ray, RLlib offers off-the-shelf, highly distributed algorithms, policies, loss functions, and default models (including the option to auto-wrap a neural network with an LSTM or an attention net). Furthermore, our library comes with a built-in Server/Client setup, allowing you to connect hundreds of external simulators (clients) via the network to an RLlib server process, which provides learning functionality and serves action queries. User customizations are realized via sub-classing the existing abstractions and - by overriding certain methods in those sub-classes - define custom behavior.