A Gentle Introduction to Ray

Fork me on GitHub https://github.com/ray-project/ray/raw/master/doc/source/images/ray_header_logo.png

Ray is a fast and simple framework for building and running distributed applications.

Ray accomplishes this mission by:

  1. Providing simple primitives for building and running distributed applications.

  2. Enabling end users to parallelize single machine code, with little to zero code changes.

  3. Including a large ecosystem of applications, libraries, and tools on top of the core Ray to enable complex applications.

Ray Core provides the simple primitives for application building.

On top of Ray Core are several libraries for solving problems in machine learning:

Ray also has a number of other community contributed libraries:

This tutorial will provide a tour of the core features of Ray.

First, install Ray with: pip install ray, and now we can execute some Python in parallel.

Parallelizing Python Functions with Ray Tasks

First, import ray and init the Ray service. Then decorate your function with @ray.remote to declare that you want to run this function remotely. Lastly, call that function with .remote() instead of calling it normally. This remote call yields a future, or ObjectRef that you can then fetch with ray.get.

import ray

def f(x):
    return x * x

futures = [f.remote(i) for i in range(4)]
print(ray.get(futures)) # [0, 1, 4, 9]

In the above code block we defined some Ray Tasks. While these are great for stateless operations, sometimes you must maintain the state of your application. You can do that with Ray Actors.

Parallelizing Python Classes with Ray Actors

Ray provides actors to allow you to parallelize an instance of a class in Python. When you instantiate a class that is a Ray actor, Ray will start a remote instance of that class in the cluster. This actor can then execute remote method calls and maintain its own internal state.

import ray
ray.init() # Only call this once.

class Counter(object):
    def __init__(self):
        self.n = 0

    def increment(self):
        self.n += 1

    def read(self):
        return self.n

counters = [Counter.remote() for i in range(4)]
[c.increment.remote() for c in counters]
futures = [c.read.remote() for c in counters]
print(ray.get(futures)) # [1, 1, 1, 1]

An Overview of the Ray Libraries

Ray has a rich ecosystem of libraries and frameworks built on top of it. The main ones being:

Tune Quick Start

Tune is a library for hyperparameter tuning at any scale. With Tune, you can launch a multi-node distributed hyperparameter sweep in less than 10 lines of code. Tune supports any deep learning framework, including PyTorch, TensorFlow, and Keras.


To run this example, you will need to install the following:

$ pip install 'ray[tune]'

This example runs a small grid search with an iterative training function.

from ray import tune

def objective(step, alpha, beta):
    return (0.1 + alpha * step / 100)**(-1) + beta * 0.1

def training_function(config):
    # Hyperparameters
    alpha, beta = config["alpha"], config["beta"]
    for step in range(10):
        # Iterative training function - can be any arbitrary training procedure.
        intermediate_score = objective(step, alpha, beta)
        # Feed the score back back to Tune.

analysis = tune.run(
        "alpha": tune.grid_search([0.001, 0.01, 0.1]),
        "beta": tune.choice([1, 2, 3])

print("Best config: ", analysis.get_best_config(metric="mean_loss"))

# Get a dataframe for analyzing trial results.
df = analysis.dataframe()

If TensorBoard is installed, automatically visualize all trial results:

tensorboard --logdir ~/ray_results

RLlib Quick Start

RLlib is an open-source library for reinforcement learning built on top of Ray that offers both high scalability and a unified API for a variety of applications.

pip install tensorflow  # or tensorflow-gpu
pip install ray[rllib]  # also recommended: ray[debug]
import gym
from gym.spaces import Discrete, Box
from ray import tune

class SimpleCorridor(gym.Env):
    def __init__(self, config):
        self.end_pos = config["corridor_length"]
        self.cur_pos = 0
        self.action_space = Discrete(2)
        self.observation_space = Box(0.0, self.end_pos, shape=(1, ))

    def reset(self):
        self.cur_pos = 0
        return [self.cur_pos]

    def step(self, action):
        if action == 0 and self.cur_pos > 0:
            self.cur_pos -= 1
        elif action == 1:
            self.cur_pos += 1
        done = self.cur_pos >= self.end_pos
        return [self.cur_pos], 1 if done else 0, done, {}

        "env": SimpleCorridor,
        "num_workers": 4,
        "env_config": {"corridor_length": 5}})

Where to go next?

Visit the Walkthrough page a more comprehensive overview of Ray features.

Ray programs can run on a single machine, and can also seamlessly scale to large clusters. To execute the above Ray script in the cloud, just download this configuration file, and run:

ray submit [CLUSTER.YAML] example.py --start

Read more about :ref:launching clusters <cluster-index>.