How To Contribute to RLlib

Development Install

You can develop RLlib locally without needing to compile Ray by using the setup-dev.py script. This sets up symlinks between the ray/rllib dir in your local git clone and the respective directory bundled with the pip-installed ray package. This way, every change you make in the source files in your local git clone will immediately be reflected in your installed ray as well. However if you have installed ray from source using these instructions then do not use this, as these steps should have already created this symlink. When using this script, make sure that your git branch is in sync with the installed Ray binaries (i.e., you are up-to-date on master and have the latest wheel installed.)

# Clone your fork onto your local machine, e.g.:
git clone https://github.com/[your username]/ray.git
cd ray
# Only enter 'Y' at the first question on linking RLlib.
# This leads to the most stable behavior and you won't have to re-install ray as often.
# If you anticipate making changes to e.g. tune quite often, consider also symlinking ray tune here
# (say 'Y' when asked by the script about creating the tune symlink).
python python/ray/setup-dev.py

API Stability

Objects and methods annotated with @PublicAPI, @DeveloperAPI, or @ExperimentalAPI have the following API compatibility guarantees:

ray.rllib.utils.annotations.PublicAPI(obj)[source]

Decorator for documenting public APIs.

Public APIs are classes and methods exposed to end users of RLlib. You can expect these APIs to remain stable across RLlib releases.

Subclasses that inherit from a @PublicAPI base class can be assumed part of the RLlib public API as well (e.g., all trainer classes are in public API because Trainer is @PublicAPI).

In addition, you can assume all trainer configurations are part of their public API as well.

Examples

>>> # Indicates that the `Trainer` class is exposed to end users
>>> # of RLlib and will remain stable across RLlib releases.
>>> from ray import tune
>>> @PublicAPI 
>>> class Trainer(tune.Trainable): 
...     ... 
ray.rllib.utils.annotations.DeveloperAPI(obj)[source]

Decorator for documenting developer APIs.

Developer APIs are classes and methods explicitly exposed to developers for the purposes of building custom algorithms or advanced training strategies on top of RLlib internals. You can generally expect these APIs to be stable sans minor changes (but less stable than public APIs).

Subclasses that inherit from a @DeveloperAPI base class can be assumed part of the RLlib developer API as well.

Examples

>>> # Indicates that the `TorchPolicy` class is exposed to end users
>>> # of RLlib and will remain (relatively) stable across RLlib
>>> # releases.
>>> from ray.rllib.policy import Policy
>>> @DeveloperAPI 
... class TorchPolicy(Policy): 
...     ... 
ray.rllib.utils.annotations.ExperimentalAPI(obj)[source]

Decorator for documenting experimental APIs.

Experimental APIs are classes and methods that are in development and may change at any time in their development process. You should not expect these APIs to be stable until their tag is changed to DeveloperAPI or PublicAPI.

Subclasses that inherit from a @ExperimentalAPI base class can be assumed experimental as well.

Examples

>>> from ray.rllib.policy import Policy
>>> class TorchPolicy(Policy): 
...     ... 
...     # Indicates that the `TorchPolicy.loss` method is a new and
...     # experimental API and may change frequently in future
...     # releases.
...     @ExperimentalAPI 
...     def loss(self, model, action_dist, train_batch): 
...         ... 

Features

Feature development, discussion, and upcoming priorities are tracked on the GitHub issues page (note that this may not include all development efforts).

Benchmarks

A number of training run results are available in the rl-experiments repo, and there is also a list of working hyperparameter configurations in tuned_examples, sorted by algorithm. Benchmark results are extremely valuable to the community, so if you happen to have results that may be of interest, consider making a pull request to either repo.

Contributing Algorithms

These are the guidelines for merging new algorithms into RLlib:

  • Contributed algorithms (rllib/contrib):
    • must subclass Trainer and implement the step() method

    • must include a lightweight test (example) to ensure the algorithm runs

    • should include tuned hyperparameter examples and documentation

    • should offer functionality not present in existing algorithms

  • Fully integrated algorithms (rllib/agents) have the following additional requirements:
    • must fully implement the Trainer API

    • must offer substantial new functionality not possible to add to other algorithms

    • should support custom models and preprocessors

    • should use RLlib abstractions and support distributed execution

Both integrated and contributed algorithms ship with the ray PyPI package, and are tested as part of Ray’s automated tests. The main difference between contributed and fully integrated algorithms is that the latter will be maintained by the Ray team to a much greater extent with respect to bugs and integration with RLlib features.

How to add an algorithm to contrib

It takes just two changes to add an algorithm to contrib. A minimal example can be found here. First, subclass Trainer and implement the _init and step methods:

class RandomAgent(Trainer):
    """Trainer that produces random actions and never learns."""

    @classmethod
    @override(Trainer)
    def get_default_config(cls) -> TrainerConfigDict:
        return with_common_config({
            "rollouts_per_iteration": 10,
            "framework": "tf",  # not used
        })

    @override(Trainer)
    def _init(self, config, env_creator):
        self.env = env_creator(config["env_config"])

    @override(Trainer)
    def step(self):
        rewards = []
        steps = 0
        for _ in range(self.config["rollouts_per_iteration"]):
            obs = self.env.reset()
            done = False
            reward = 0.0
            while not done:
                action = self.env.action_space.sample()
                obs, r, done, info = self.env.step(action)
                reward += r
                steps += 1
            rewards.append(reward)
        return {
            "episode_reward_mean": np.mean(rewards),
            "timesteps_this_iter": steps,
        }

Second, register the trainer with a name in contrib/registry.py.

def _import_random_agent():
    from ray.rllib.contrib.random_agent.random_agent import RandomAgent
    return RandomAgent

def _import_random_agent_2():
    from ray.rllib.contrib.random_agent_2.random_agent_2 import RandomAgent2
    return RandomAgent2

CONTRIBUTED_ALGORITHMS = {
    "contrib/RandomAgent": _import_random_trainer,
    "contrib/RandomAgent2": _import_random_trainer_2,
    # ...
}

After registration, you can run and visualize training progress using rllib train:

rllib train --run=contrib/RandomAgent --env=CartPole-v0
tensorboard --logdir=~/ray_results

Debugging your Algorithms

Finding Memory Leaks In Workers

Keeping the memory usage of long running workers stable can be challenging. The MemoryTrackingCallbacks class can be used to track memory usage of workers.

class ray.rllib.agents.callbacks.MemoryTrackingCallbacks[source]

MemoryTrackingCallbacks can be used to trace and track memory usage in rollout workers.

The Memory Tracking Callbacks uses tracemalloc and psutil to track python allocations during rollouts, in training or evaluation.

The tracking data is logged to the custom_metrics of an episode and can therefore be viewed in tensorboard (or in WandB etc..)

Add MemoryTrackingCallbacks callback to the tune config e.g. { …’callbacks’: MemoryTrackingCallbacks …}

Note

This class is meant for debugging and should not be used in production code as tracemalloc incurs a significant slowdown in execution speed.

The objects with the top 20 memory usage in the workers will be added as custom metrics. These can then be monitored using tensorboard or other metrics integrations like Weights and Biases:

../_images/MemoryTrackingCallbacks.png

Troubleshooting

If you encounter errors like blas_thread_init: pthread_create: Resource temporarily unavailable when using many workers, try setting OMP_NUM_THREADS=1. Similarly, check configured system limits with ulimit -a for other resource limit errors.

For debugging unexpected hangs or performance problems, you can run ray stack to dump the stack traces of all Ray workers on the current node, ray timeline to dump a timeline visualization of tasks to a file, and ray memory to list all object references in the cluster.

TensorFlow 2.x

It is now recommended to use framework=tf2 and eager_tracing=True (in case you are developing with TensorFlow) for maximum performance and support. We will however continue to support framework=tf (static-graph) for the foreseeable future.

For debugging purposes, you should use framework=tf2 with eager_tracing=False. All tf.Tensor values will then be visible and printable when executing your code. However, some slowdown is to be expected in this config mode.

Older TensorFlow versions

RLlib supports both TensorFlow 2.x as well as tf.compat.v1 modes. Always use the ray.rllib.utils.framework.try_import_tf() utility function to import tensorflow. It returns three values:

  • tf1: The tf.compat.v1 module or the installed tf1.x package (if the version is < 2.0).

  • tf: The installed tensorflow module as-is.

  • tfv: A version integer, whose value is either 1 or 2.

See here for a detailed example script.

Production RL Summit - March 29

Join us at Production RL Summit — a free virtual reinforcement learning event showcasing how companies like JP Morgan, Riot Games, Dow, and Siemens are apply RL to real business problems. Connect with peers and experts from the RL community, and sharpen your RL skills with hands-on workshop.

Register Now