Note

Ray 2.10.0 introduces the alpha stage of RLlib’s “new API stack”. The team is currently transitioning algorithms, example scripts, and documentation to the new code base throughout the subsequent minor releases leading up to Ray 3.0.

See here for more details on how to activate and use the new API stack.

Examples#

This page contains an index of all the python scripts in the examples folder of RLlib, demonstrating the different use cases and features of the library.

Note

RLlib is currently in a transition state from “old API stack” to “new API stack”. Some of the examples here haven’t been translated yet to the new stack and are tagged with the following comment line on top: # @OldAPIStack. The moving of all example scripts over to the “new API stack” is work in progress and expected to be completed by the end of 2024.

Note

If any new-API-stack example is broken, or if you’d like to add an example to this page, feel free to raise an issue on RLlib’s github repository.

Folder Structure#

The examples folder is structured into several sub-directories, the contents of all of which are described in detail below.

How to run an example script#

Most of the example scripts are self-executable, meaning you can just cd into the respective directory and run the script as-is with python:

$ cd ray/rllib/examples/multi_agent
$ python multi_agent_pendulum.py --enable-new-api-stack --num-agents=2

Use the --help command line argument to have each script print out its supported command line options.

Most of the scripts share a common subset of generally applicable command line arguments, for example --num-env-runners, --no-tune, or --wandb-key.

All example sub-folders#

Algorithms#

Checkpoints#

Connectors#

Note

RLlib’s Connector API has been re-written from scratch for the new API stack. Connector-pieces and -pipelines are now referred to as ConnectorV2 (as opposed to Connector, which only continue to work on the old API stack).

  • How to frame-stack Atari image observations:

    An example using Atari framestacking in a very efficient manner, not in the environment itself (as a gym.Wrapper), but by stacking the observations on-the-fly using EnvToModule and LearnerConnector pipelines. This method of framestacking is more efficient as it avoids having to send large observation tensors through the network (ray).

  • How to mean/std-filter observations:

    An example of a ConnectorV2 that filters all observations from the environment using a plain mean/std-filter (shift by the mean and divide by std-dev). This example demonstrates how a stateful ConnectorV2 class has its states (here the means and standard deviations of the individual observation items) coming from the different EnvRunner instances a) merged into one common state and then b) broadcast again back to the remote EnvRunner workers.

  • How to include previous-actions and/or previous rewards in RLModule inputs:

    An example of a ConnectorV2 that adds the n previous actions and/or the m previous rewards to the RLModule’s input dict (to perform its forward passes, both for inference and training).

  • How to train with nested action spaces:

    Learning in arbitrarily nested action spaces, using an env in which the action space equals the observation space (both are complex, nested Dicts) and the policy has to pick actions that closely match (or are identical) to the previously seen observations.

  • How to train with nested observation spaces:

    Learning in arbitrarily nested observation spaces (using a CartPole-v1 variant with a nested Dict observation space).

Curriculum Learning#

Environments#

Evaluation#

GPU (for Training and Sampling)#

  • How to use fractional GPUs for training an RLModule:

    If your model is small and easily fits on a single GPU and you want to therefore train other models alongside it to save time and cost, this script shows you how to set up your RLlib config with a fractional number of GPUs on the learner (model training) side.

Inference (of Models/Policies)#

Multi-Agent RL#

Ray Serve and RLlib#

Ray Tune and RLlib#

RLModules#

Tuned Examples#

The tuned examples folder contains python config files (yaml for the old API stack) that can be executed analogously to all other example scripts described here in order to run tuned learning experiments for the different algorithms and different environment types.

For example, see this tuned Atari example for PPO, which learns to solve the Pong environment in roughly 5min. It can be run like this on a single g5.24xlarge (or g6.24xlarge) machine with 4 GPUs and 96 CPUs:

$ cd ray/rllib/tuned_examples/ppo
$ python atari_ppo.py --env=ale_py:ALE/Pong-v5 --num-gpus=4 --num-env-runners=95

Note that some of the files in this folder are used for RLlib’s daily or weekly release tests as well.

Community Examples#

  • Arena AI:

    A General Evaluation Platform and Building Toolkit for Single/Multi-Agent Intelligence with RLlib-generated baselines.

  • CARLA:

    Example of training autonomous vehicles with RLlib and CARLA simulator.

  • The Emergence of Adversarial Communication in Multi-Agent Reinforcement Learning:

    Using Graph Neural Networks and RLlib to train multiple cooperative and adversarial agents to solve the “cover the area”-problem, thereby learning how to best communicate (or - in the adversarial case - how to disturb communication) (code).

  • Flatland:

    A dense traffic simulating environment with RLlib-generated baselines.

  • GFootball:

    Example of setting up a multi-agent version of GFootball with RLlib.

  • mobile-env:

    An open, minimalist Gymnasium environment for autonomous coordination in wireless mobile networks. Includes an example notebook using Ray RLlib for multi-agent RL with mobile-env.

  • Neural MMO:

    A multiagent AI research environment inspired by Massively Multiplayer Online (MMO) role playing games – self-contained worlds featuring thousands of agents per persistent macrocosm, diverse skilling systems, local and global economies, complex emergent social structures, and ad-hoc high-stakes single and team based conflict.

  • NeuroCuts:

    Example of building packet classification trees using RLlib / multi-agent in a bandit-like setting.

  • NeuroVectorizer:

    Example of learning optimal LLVM vectorization compiler pragmas for loops in C and C++ codes using RLlib.

  • Roboschool / SageMaker:

    Example of training robotic control policies in SageMaker with RLlib.

  • Sequential Social Dilemma Games:

    Example of using the multi-agent API to model several social dilemma games.

  • Simple custom environment for single RL with Ray and RLlib:

    Create a custom environment and train a single agent RL using Ray 2.0 with Tune.

  • StarCraft2:

    Example of training in StarCraft2 maps with RLlib / multi-agent.

  • Traffic Flow:

    Example of optimizing mixed-autonomy traffic simulations with RLlib / multi-agent.

Blog Posts#