A Gentle Introduction to Ray


Ray provides a simple, universal API for building distributed applications.

Ray accomplishes this mission by:

  1. Providing simple primitives for building and running distributed applications.

  2. Enabling end users to parallelize single machine code, with little to zero code changes.

  3. Including a large ecosystem of applications, libraries, and tools on top of the core Ray to enable complex applications.

Ray Core provides the simple primitives for application building.

On top of Ray Core are several libraries for solving problems in machine learning:

There are also many community integrations with Ray, including Dask, MARS, Modin, Horovod, Hugging Face, Scikit-learn, and others. Check out the full list of Ray distributed libraries here.

This tutorial will provide a tour of the core features of Ray.

Ray provides a Python and Java API. To use Ray in Python, first install Ray with: pip install ray. To use Ray in Java, first add the ray-api and ray-runtime dependencies in your project. Then we can use Ray to parallelize your program.

Parallelizing Python/Java Functions with Ray Tasks

First, import ray and init the Ray service. Then decorate your function with @ray.remote to declare that you want to run this function remotely. Lastly, call that function with .remote() instead of calling it normally. This remote call yields a future, or ObjectRef that you can then fetch with ray.get.

import ray

def f(x):
    return x * x

futures = [f.remote(i) for i in range(4)]
print(ray.get(futures)) # [0, 1, 4, 9]

Parallelizing Python/Java Classes with Ray Actors

Ray provides actors to allow you to parallelize an instance of a class in Python/Java. When you instantiate a class that is a Ray actor, Ray will start a remote instance of that class in the cluster. This actor can then execute remote method calls and maintain its own internal state.

import ray
ray.init() # Only call this once.

class Counter(object):
    def __init__(self):
        self.n = 0

    def increment(self):
        self.n += 1

    def read(self):
        return self.n

counters = [Counter.remote() for i in range(4)]
[c.increment.remote() for c in counters]
futures = [c.read.remote() for c in counters]
print(ray.get(futures)) # [1, 1, 1, 1]

An Overview of the Ray Libraries

Ray has a rich ecosystem of libraries and frameworks built on top of it. The main ones being:

Tune Quick Start

Tune is a library for hyperparameter tuning at any scale. With Tune, you can launch a multi-node distributed hyperparameter sweep in less than 10 lines of code. Tune supports any deep learning framework, including PyTorch, TensorFlow, and Keras.


To run this example, you will need to install the following:

$ pip install 'ray[tune]'

This example runs a small grid search with an iterative training function.

from ray import tune

def objective(step, alpha, beta):
    return (0.1 + alpha * step / 100)**(-1) + beta * 0.1

def training_function(config):
    # Hyperparameters
    alpha, beta = config["alpha"], config["beta"]
    for step in range(10):
        # Iterative training function - can be any arbitrary training procedure.
        intermediate_score = objective(step, alpha, beta)
        # Feed the score back back to Tune.

analysis = tune.run(
        "alpha": tune.grid_search([0.001, 0.01, 0.1]),
        "beta": tune.choice([1, 2, 3])

print("Best config: ", analysis.get_best_config(
    metric="mean_loss", mode="min"))

# Get a dataframe for analyzing trial results.
df = analysis.results_df

If TensorBoard is installed, automatically visualize all trial results:

tensorboard --logdir ~/ray_results

RLlib Quick Start

RLlib is an open-source library for reinforcement learning built on top of Ray that offers both high scalability and a unified API for a variety of applications.

pip install tensorflow  # or tensorflow-gpu
pip install ray[rllib]
import gym
from gym.spaces import Discrete, Box
from ray import tune

class SimpleCorridor(gym.Env):
    def __init__(self, config):
        self.end_pos = config["corridor_length"]
        self.cur_pos = 0
        self.action_space = Discrete(2)
        self.observation_space = Box(0.0, self.end_pos, shape=(1, ))

    def reset(self):
        self.cur_pos = 0
        return [self.cur_pos]

    def step(self, action):
        if action == 0 and self.cur_pos > 0:
            self.cur_pos -= 1
        elif action == 1:
            self.cur_pos += 1
        done = self.cur_pos >= self.end_pos
        return [self.cur_pos], 1 if done else 0, done, {}

        "env": SimpleCorridor,
        "num_workers": 4,
        "env_config": {"corridor_length": 5}})

Where to go next?

Visit the Walkthrough page a more comprehensive overview of Ray features.

Ray programs can run on a single machine, and can also seamlessly scale to large clusters. To execute the above Ray script in the cloud, just download this configuration file, and run:

ray submit [CLUSTER.YAML] example.py --start

Read more about launching clusters.