RLlib: Industry-Grade, Scalable Reinforcement Learning#
Note
Ray 2.40 uses RLlib’s new API stack by default. The Ray team has mostly completed transitioning algorithms, example scripts, and documentation to the new code base.
If you’re still using the old API stack, see New API stack migration guide for details on how to migrate.
RLlib is an open source library for reinforcement learning (RL), offering support for production-level, highly scalable, and fault-tolerant RL workloads, while maintaining simple and unified APIs for a large variety of industry applications.
Whether training policies in a multi-agent setup, from historic offline data, or using externally connected simulators, RLlib offers simple solutions for each of these autonomous decision making needs and enables you to start running your experiments within hours.
Industry leaders use RLlib in production in many different verticals, such as gaming, robotics, finance, climate- and industrial control, manufacturing and logistics, automobile, and boat design.
RLlib in 60 seconds#
It only takes a few steps to get your first RLlib workload up and running on your laptop. Install RLlib and PyTorch, as shown below:
pip install "ray[rllib]" torch
Note
For installation on computers running Apple Silicon, such as M1, follow instructions here.
Note
To be able to run the Atari or MuJoCo examples, you also need to do:
pip install "gymnasium[atari,accept-rom-license,mujoco]"
This is all, you can now start coding against RLlib. Here is an example for running the PPO Algorithm on the
Taxi domain.
You first create a config
for the algorithm, which defines the RL environment and any other needed settings and parameters.
from ray.rllib.algorithms.ppo import PPOConfig
from ray.rllib.connectors.env_to_module import FlattenObservations
# Configure the algorithm.
config = (
PPOConfig()
.environment("Taxi-v3")
.env_runners(
num_env_runners=2,
# Observations are discrete (ints) -> We need to flatten (one-hot) them.
env_to_module_connector=lambda env: FlattenObservations(),
)
.evaluation(evaluation_num_env_runners=1)
)
Next, build
the algorithm and train
it for a total of five iterations.
One training iteration includes parallel, distributed sample collection by the
EnvRunner
actors, followed by loss calculation
on the collected data, and a model update step.
from pprint import pprint
# Build the algorithm.
algo = config.build_algo()
# Train it for 5 iterations ...
for _ in range(5):
pprint(algo.train())
At the end of your script, you evaluate the trained Algorithm and release all its resources:
# ... and evaluate it.
pprint(algo.evaluate())
# Release the algo's resources (remote actors, like EnvRunners and Learners).
algo.stop()
You can use any Farama-Foundation Gymnasium registered environment
with the env
argument.
In config.env_runners()
you can specify - amongst many other things - the number of parallel
EnvRunner
actors to collect samples from the environment.
You can also tweak the NN architecture used by tweaking RLlib’s DefaultModelConfig
,
as well as, set up a separate config for the evaluation
EnvRunner
actors through the config.evaluation()
method.
See here, if you want to learn more about the RLlib training APIs. Also, see here for a simple example on how to write an action inference loop after training.
If you want to get a quick preview of which algorithms and environments RLlib supports, click the dropdowns below:
Why chose RLlib?#
Learn More#
RLlib Key Concepts
Learn more about the core concepts of RLlib, such as Algorithms, environments, models, and learners.
RL Environments
Get started with environments supported by RLlib, such as Farama foundation’s Gymnasium, Petting Zoo, and many custom formats for vectorized and multi-agent environments.
Models (RLModule)
Learn how to configure RLlib’s default models and implement your own custom models through the RLModule APIs, which support arbitrary architectures with PyTorch, complex multi-model setups, and multi-agent models with components shared between agents.
Algorithms
See the many available RL algorithms of RLlib for on-policy and off-policy training, offline- and model-based RL, multi-agent RL, and more.
Customizing RLlib#
RLlib provides powerful, yet easy to use APIs for customizing all aspects of your experimental- and production training-workflows. For example, you may code your own environments in python using the Farama Foundation’s gymnasium or DeepMind’s OpenSpiel, provide custom PyTorch models, write your own optimizer setups and loss definitions, or define custom exploratory behavior.
Citing RLlib#
If RLlib helps with your academic research, the Ray RLlib team encourages you to cite these papers:
@inproceedings{liang2021rllib,
title={{RLlib} Flow: Distributed Reinforcement Learning is a Dataflow Problem},
author={
Wu, Zhanghao and
Liang, Eric and
Luo, Michael and
Mika, Sven and
Gonzalez, Joseph E. and
Stoica, Ion
},
booktitle={Conference on Neural Information Processing Systems ({NeurIPS})},
year={2021},
url={https://proceedings.neurips.cc/paper/2021/file/2bce32ed409f5ebcee2a7b417ad9beed-Paper.pdf}
}
@inproceedings{liang2018rllib,
title={{RLlib}: Abstractions for Distributed Reinforcement Learning},
author={
Eric Liang and
Richard Liaw and
Robert Nishihara and
Philipp Moritz and
Roy Fox and
Ken Goldberg and
Joseph E. Gonzalez and
Michael I. Jordan and
Ion Stoica,
},
booktitle = {International Conference on Machine Learning ({ICML})},
year={2018},
url={https://arxiv.org/pdf/1712.09381}
}