Note

Ray 2.10.0 introduces the alpha stage of RLlib’s “new API stack”. The Ray Team plans to transition algorithms, example scripts, and documentation to the new code base thereby incrementally replacing the “old API stack” (e.g., ModelV2, Policy, RolloutWorker) throughout the subsequent minor releases leading up to Ray 3.0.

Note, however, that so far only PPO (single- and multi-agent) and SAC (single-agent only) support the “new API stack” and continue to run by default with the old APIs. You can continue to use the existing custom (old stack) classes.

See here for more details on how to use the new API stack.

Examples#

This page contains an index of all the python scripts in the examples folder of RLlib, demonstrating the different use cases and features of the library.

Note

RLlib is currently in a transition state from “old API stack” to “new API stack”. Some of the examples here haven’t been translated yet to the new stack and are tagged with the following comment line on top: # @OldAPIStack. The moving of all example scripts over to the “new API stack” is work in progress and expected to be completed by the end of 2024.

Note

If any new-API-stack example is broken, or if you’d like to add an example to this page, feel free to raise an issue on RLlib’s github repository.

Folder Structure#

The examples folder is structured into several sub-directories, the contents of all of which are described in detail below.

How to run an example script#

Most of the example scripts are self-executable, meaning you can just cd into the respective directory and run the script as-is with python:

$ cd ray/rllib/examples/multi_agent
$ python multi_agent_pendulum.py --enable-new-api-stack --num-agents=2

Use the --help command line argument to have each script print out its supported command line options.

Most of the scripts share a common subset of generally applicable command line arguments, for example --num-env-runners, --no-tune, or --wandb-key.

All sub-folders#

Algorithms#

Checkpoints#

Connectors#

Note

RLlib’s Connector API has been re-written from scratch for the new API stack (new_stack). Connector-pieces and -pipelines are now referred to as ConnectorV2 (as opposed to Connector, which only continue to work on the old API stack old_stack).

  • new_stack How to frame-stack Atari image observations:

    An example using Atari framestacking in a very efficient manner, not in the environment itself (as a gym.Wrapper), but by stacking the observations on-the-fly using EnvToModule and LearnerConnector pipelines. This method of framestacking is more efficient as it avoids having to send large observation tensors through the network (ray).

  • new_stack How to mean/std-filter observations:

    An example of a ConnectorV2 that filters all observations from the environment using a plain mean/std-filter (shift by the mean and divide by std-dev). This example demonstrates how a stateful ConnectorV2 class has its states (here the means and standard deviations of the individual observation items) coming from the different EnvRunner instances a) merged into one common state and then b) broadcast again back to the remote EnvRunner workers.

  • new_stack How to include previous-actions and/or previous rewards in RLModule inputs:

    An example of a ConnectorV2 that adds the n previous actions and/or the m previous rewards to the RLModule’s input dict (to perform its forward passes, both for inference and training).

  • new_stack How to train with nested action spaces:

    Learning in arbitrarily nested action spaces, using an env in which the action space equals the observation space (both are complex, nested Dicts) and the policy has to pick actions that closely match (or are identical) to the previously seen observations.

  • new_stack How to train with nested observation spaces:

    Learning in arbitrarily nested observation spaces (using a CartPole-v1 variant with a nested Dict observation space).

Curriculum Learning#

Debugging#

Environments#

  • new_stack How to register a custom gymnasium environment:

    Example showing how to write your own RL environment using gymnasium and register it to run train your algorithm against this env with RLlib.

  • new_stack How to set up rendering (and recording) of the environment trajectories during training with WandB:

    Example showing how you can render and record episode trajectories of your gymnasium envs and log the videos to WandB.

  • old_stack How to run a Unity3D multi-agent environment locally:

    Example of how to setup an RLlib Algorithm against a locally running Unity3D editor instance to learn any Unity3D game (including support for multi-agent). Use this example to try things out and watch the game and the learning progress live in the editor. Providing a compiled game, this example could also run in distributed fashion with num_env_runners > 0. For a more heavy-weight, distributed, cloud-based example, see Unity3D client/server below.

  • old_stack How to run with a Unity3D client/server setup:

    Example of how to setup n distributed Unity3D (compiled) games in the cloud that function as data collecting clients against a central RLlib Policy server learning how to play the game. The n distributed clients could themselves be servers for external/human players and allow for control being fully in the hands of the Unity entities instead of RLlib. Note: Uses Unity’s MLAgents SDK (>=1.0) and supports all provided MLAgents example games and multi-agent setups.

  • old_stack How to run with a CartPole client/server setup:

    Example of online serving of predictions for a simple CartPole policy.

Evaluation#

GPU (for Training and Sampling)#

  • new_stack How to use fractional GPUs for training an RLModule:

    If your model is small and easily fits on a single GPU and you want to therefore train other models alongside it to save time and cost, this script shows you how to set up your RLlib config with a fractional number of GPUs on the learner (model training) side.

Hierarchical Training#

Inference (of Models/Policies)#

Metrics#

Multi-Agent RL#

Offline RL#

Ray Serve and RLlib#

Ray Tune and RLlib#

RLModules#

Tuned Examples#

The tuned examples folder contains python config files (yaml for the old API stack) that can be executed analogously to all other example scripts described here in order to run tuned learning experiments for the different algorithms and different environment types.

For example, see this tuned Atari example for PPO, which learns to solve the Pong environment in roughly 5min. It can be run like this on a single g5.24xlarge (or g6.24xlarge) machine with 4 GPUs and 96 CPUs:

$ cd ray/rllib/tuned_examples/ppo
$ python atari_ppo.py --env ALE/Pong-v5 --num-gpus=4 --num-env-runners=95

Note that some of the files in this folder are used for RLlib’s daily or weekly release tests as well.

Community Examples#

  • old_stack Arena AI:

    A General Evaluation Platform and Building Toolkit for Single/Multi-Agent Intelligence with RLlib-generated baselines.

  • old_stack CARLA:

    Example of training autonomous vehicles with RLlib and CARLA simulator.

  • old_stack The Emergence of Adversarial Communication in Multi-Agent Reinforcement Learning:

    Using Graph Neural Networks and RLlib to train multiple cooperative and adversarial agents to solve the “cover the area”-problem, thereby learning how to best communicate (or - in the adversarial case - how to disturb communication) (code).

  • old_stack Flatland:

    A dense traffic simulating environment with RLlib-generated baselines.

  • old_stack GFootball:

    Example of setting up a multi-agent version of GFootball with RLlib.

  • old_stack mobile-env:

    An open, minimalist Gymnasium environment for autonomous coordination in wireless mobile networks. Includes an example notebook using Ray RLlib for multi-agent RL with mobile-env.

  • old_stack Neural MMO:

    A multiagent AI research environment inspired by Massively Multiplayer Online (MMO) role playing games – self-contained worlds featuring thousands of agents per persistent macrocosm, diverse skilling systems, local and global economies, complex emergent social structures, and ad-hoc high-stakes single and team based conflict.

  • old_stack NeuroCuts:

    Example of building packet classification trees using RLlib / multi-agent in a bandit-like setting.

  • old_stack NeuroVectorizer:

    Example of learning optimal LLVM vectorization compiler pragmas for loops in C and C++ codes using RLlib.

  • old_stack Roboschool / SageMaker:

    Example of training robotic control policies in SageMaker with RLlib.

  • old_stack Sequential Social Dilemma Games:

    Example of using the multi-agent API to model several social dilemma games.

  • old_stack Simple custom environment for single RL with Ray and RLlib:

    Create a custom environment and train a single agent RL using Ray 2.0 with Tune.

  • old_stack StarCraft2:

    Example of training in StarCraft2 maps with RLlib / multi-agent.

  • old_stack Traffic Flow:

    Example of optimizing mixed-autonomy traffic simulations with RLlib / multi-agent.

Blog Posts#