Ray Projects (Experimental)

Ray projects make it easy to package a Ray application so it can be rerun later in the same environment. They allow for the sharing and reliable reuse of existing code.

Quick start (CLI)

# Creates a project in the current directory. It will create a
# project.yaml defining the code and environment and a cluster.yaml
# describing the cluster configuration. Both will be created in the
# ray-project subdirectory of the current directory.
$ ray project create <project-name>

# Create a new session from the given project.  Launch a cluster and run
# the command, which must be specified in the project.yaml file. If no
# command is specified, the "default" command in ray-project/project.yaml
# will be used. Alternatively, use --shell to run a raw shell command.
$ ray session start <command-name> [arguments] [--shell]

# Open a console for the given session.
$ ray session attach

# Stop the given session and terminate all of its worker nodes.
$ ray session stop

Examples

See the readme for instructions on how to run these examples:

  • Open Tacotron: A TensorFlow implementation of Google’s Tacotron speech synthesis with pre-trained model (unofficial)

  • PyTorch Transformers: A library of state-of-the-art pretrained models for Natural Language Processing (NLP)

Tutorial

We will walk through how to use projects by executing the streaming MapReduce example. Commands always apply to the project in the current directory. Let us switch into the project directory with

cd ray/doc/examples/streaming

A session represents a running instance of a project. Let’s start one with

ray session start

The ray session start command will bring up a new cluster and initialize the environment of the cluster according to the environment section of the project.yaml, installing all dependencies of the project.

Now we can execute a command in the session. To see a list of all available commands of the project, run

ray session commands

which produces the following output:

Active project: ray-example-streaming

Command "run":
  usage: run [--num-mappers NUM_MAPPERS] [--num-reducers NUM_REDUCERS]

  Start the streaming example.

  optional arguments:
    --num-mappers NUM_MAPPERS
                        Number of mapper actors used
    --num-reducers NUM_REDUCERS
                        Number of reducer actors used

As you see, in this project there is only a single run command which has arguments --num-mappers and --num-reducers. We can execute the streaming wordcount with the default parameters by running

ray session execute run

You can interrupt the command with <Control>-c and attach to the running session by executing

ray session attach --tmux

Inside the session you can for example edit the streaming applications with

cd ray-example-streaming
emacs streaming.py

Try for example to add the following lines after the for count in counts: loop:

if "million" in wordcounts:
  print("Found the word!")

and re-run the application from outside the session with

ray session execute run

The session can be terminated from outside the session with

ray session stop

Project file format (project.yaml)

A project file contains everything required to run a project. This includes a cluster configuration, the environment and dependencies for the application, and the specific inputs used to run the project.

Here is an example for a minimal project format:

name: test-project
description: "This is a simple test project"
repo: https://github.com/ray-project/ray

# Cluster to be instantiated by default when starting the project.
cluster:
  config: ray-project/cluster.yaml

# Commands/information to build the environment, once the cluster is
# instantiated. This can include the versions of python libraries etc.
# It can be specified as a Python requirements.txt, a conda environment,
# a Dockerfile, or a shell script to run to set up the libraries.
environment:
  requirements: requirements.txt

# List of commands that can be executed once the cluster is instantiated
# and the environment is set up.
# A command can also specify a cluster that overwrites the default cluster.
commands:
  - name: default
    command: python default.py
    help: "The command that will be executed if no command name is specified"
  - name: test
    command: python test.py --param1={{param1}} --param2={{param2}}
    help: "A test command"
    params:
      - name: "param1"
        help: "The first parameter"
        # The following line indicates possible values this parameter can take.
        choices: ["1", "2"]
      - name: "param2"
        help: "The second parameter"

Project files have to adhere to the following schema:

type

object

properties

  • name

The name of the project

type

string

  • description

A short description of the project

type

string

  • repo

The URL of the repo this project is part of

type

string

  • documentation

Link to the documentation of this project

type

string

  • tags

Relevant tags for this project

type

array

items

type

string

  • cluster

type

object

properties

  • config

Path to a .yaml cluster configuration file (relative to the project root)

type

string

  • params

type

array

items

type

object

properties

  • name

type

string

  • help

type

string

  • choices

type

array

  • default

  • type

type

string

enum

int, float, str

additionalProperties

False

additionalProperties

False

  • environment

The environment that needs to be set up to run the project

type

object

properties

  • dockerimage

URL to a docker image that can be pulled to run the project in

type

string

  • dockerfile

Path to a Dockerfile to set up an image the project can run in (relative to the project root)

type

string

  • requirements

Path to a Python requirements.txt file to set up project dependencies (relative to the project root)

type

string

  • shell

A sequence of shell commands to run to set up the project environment

type

array

items

type

string

additionalProperties

False

  • commands

type

array

items

Possible commands to run to start a session

type

object

properties

  • name

Name of the command

type

string

  • help

Help string for the command

type

string

  • command

Shell command to run on the cluster

type

string

  • params

type

array

items

Possible parameters in the command

type

object

properties

  • name

Name of the parameter

type

string

  • help

Help string for the parameter

type

string

  • choices

Possible values the parameter can take

type

array

  • default

  • type

Required type for the parameter

type

string

enum

int, float, str

additionalProperties

False

  • config

Configuration options for the command

type

object

properties

  • tmux

If true, the command will be run inside of tmux

type

boolean

additionalProperties

False

additionalProperties

False

  • output_files

type

array

items

type

string

additionalProperties

False

Cluster file format (cluster.yaml)

This is the same as for the autoscaler, see Cluster Launch page.