Ray 2.10.0 introduces the alpha stage of RLlib’s “new API stack”. The Ray Team plans to transition algorithms, example scripts, and documentation to the new code base thereby incrementally replacing the “old API stack” (e.g., ModelV2, Policy, RolloutWorker) throughout the subsequent minor releases leading up to Ray 3.0.

Note, however, that so far only PPO (single- and multi-agent) and SAC (single-agent only) support the “new API stack” and continue to run by default with the old APIs. You can continue to use the existing custom (old stack) classes.

See here for more details on how to use the new API stack.


This page is an index of examples for the various use cases and features of RLlib.

If any example is broken, or if you’d like to add an example to this page, feel free to raise an issue on our Github repository.

Tuned Examples#

  • Tuned examples:

    Collection of tuned hyperparameters sorted by algorithm.

Environments and Adapters#

  • Registering a custom env and model:

    Example of defining and registering a gym env and model for use with RLlib.

  • Local Unity3D multi-agent environment example:

    Example of how to setup an RLlib Algorithm against a locally running Unity3D editor instance to learn any Unity3D game (including support for multi-agent). Use this example to try things out and watch the game and the learning progress live in the editor. Providing a compiled game, this example could also run in distributed fashion with num_env_runners > 0. For a more heavy-weight, distributed, cloud-based example, see Unity3D client/server below.

Custom- and Complex Models#

Training Workflows#


  • Custom evaluation function:

    Example of how to write a custom evaluation function that is called instead of the default behavior, which is running with the evaluation worker set through n episodes.

  • Parallel evaluation and training:

    Example showing how the evaluation workers and the “normal” rollout workers can run (to some extend) in parallel to speed up training.

Serving and Offline#

  • Offline RL with CQL:

    Example showing how to run an offline RL training job using a historic-data json file.

  • Another example for using RLlib with Ray Serve

    This script offers a simple workflow for 1) training a policy with RLlib first, 2) creating a new policy 3) restoring its weights from the trained one and serving the new policy via Ray Serve.

  • Unity3D client/server:

    Example of how to setup n distributed Unity3D (compiled) games in the cloud that function as data collecting clients against a central RLlib Policy server learning how to play the game. The n distributed clients could themselves be servers for external/human players and allow for control being fully in the hands of the Unity entities instead of RLlib. Note: Uses Unity’s MLAgents SDK (>=1.0) and supports all provided MLAgents example games and multi-agent setups.

  • CartPole client/server:

    Example of online serving of predictions for a simple CartPole policy.

  • Saving experiences:

    Example of how to externally generate experience batches in RLlib-compatible format.

  • Finding a checkpoint using custom criteria:

    Example of how to find a checkpoint after a via some custom defined criteria.

Multi-Agent and Hierarchical#

Special Action- and Observation Spaces#

Community Examples#

  • Arena AI:

    A General Evaluation Platform and Building Toolkit for Single/Multi-Agent Intelligence with RLlib-generated baselines.

  • CARLA:

    Example of training autonomous vehicles with RLlib and CARLA simulator.

  • The Emergence of Adversarial Communication in Multi-Agent Reinforcement Learning:

    Using Graph Neural Networks and RLlib to train multiple cooperative and adversarial agents to solve the “cover the area”-problem, thereby learning how to best communicate (or - in the adversarial case - how to disturb communication) (code).

  • Flatland:

    A dense traffic simulating environment with RLlib-generated baselines.

  • GFootball:

    Example of setting up a multi-agent version of GFootball with RLlib.

  • mobile-env:

    An open, minimalist Gymnasium environment for autonomous coordination in wireless mobile networks. Includes an example notebook using Ray RLlib for multi-agent RL with mobile-env.

  • Neural MMO:

    A multiagent AI research environment inspired by Massively Multiplayer Online (MMO) role playing games – self-contained worlds featuring thousands of agents per persistent macrocosm, diverse skilling systems, local and global economies, complex emergent social structures, and ad-hoc high-stakes single and team based conflict.

  • NeuroCuts:

    Example of building packet classification trees using RLlib / multi-agent in a bandit-like setting.

  • NeuroVectorizer:

    Example of learning optimal LLVM vectorization compiler pragmas for loops in C and C++ codes using RLlib.

  • Roboschool / SageMaker:

    Example of training robotic control policies in SageMaker with RLlib.

  • Sequential Social Dilemma Games:

    Example of using the multi-agent API to model several social dilemma games.

  • Simple custom environment for single RL with Ray and RLlib:

    Create a custom environment and train a single agent RL using Ray 2.0 with Tune.

  • StarCraft2:

    Example of training in StarCraft2 maps with RLlib / multi-agent.

  • Traffic Flow:

    Example of optimizing mixed-autonomy traffic simulations with RLlib / multi-agent.

Blog Posts#