Launching Cloud Clusters

This section provides instructions for configuring the Ray Cluster Launcher to use with AWS/Azure/GCP, an existing Kubernetes cluster, or on a private cluster of host machines.

See this blog post for a step by step guide to using the Ray Cluster Launcher.

AWS/GCP/Azure

First, install boto (pip install boto3) and configure your AWS credentials in ~/.aws/credentials, as described in the boto docs.

Once boto is configured to manage resources on your AWS account, you should be ready to launch your cluster. The provided ray/python/ray/autoscaler/aws/example-full.yaml cluster config file will create a small cluster with an m5.large head node (on-demand) configured to autoscale up to two m5.large spot workers.

Test that it works by running the following commands from your local machine:

# Create or update the cluster. When the command finishes, it will print
# out the command that can be used to SSH into the cluster head node.
$ ray up ray/python/ray/autoscaler/aws/example-full.yaml

# Get a remote screen on the head node.
$ ray attach ray/python/ray/autoscaler/aws/example-full.yaml
$ # Try running a Ray program with 'ray.init(address="auto")'.

# Tear down the cluster.
$ ray down ray/python/ray/autoscaler/aws/example-full.yaml

See AWS Configurations for recipes on customizing AWS clusters.

Kubernetes

The cluster launcher can also be used to start Ray clusters on an existing Kubernetes cluster.

Once you have kubectl configured locally to access the remote cluster, you should be ready to launch your cluster. The provided ray/python/ray/autoscaler/kubernetes/example-full.yaml cluster config file will create a small cluster of one pod for the head node configured to autoscale up to two worker node pods, with all pods requiring 1 CPU and 0.5GiB of memory. It’s also possible to deploy service and ingress resources for each scaled worker pod. An example is provided in ray/python/ray/autoscaler/kubernetes/example-ingress.yaml.

Test that it works by running the following commands from your local machine:

# Create or update the cluster. When the command finishes, it will print
# out the command that can be used to get a remote shell into the head node.
$ ray up ray/python/ray/autoscaler/kubernetes/example-full.yaml

# List the pods running in the cluster. You shoud only see one head node
# until you start running an application, at which point worker nodes
# should be started. Don't forget to include the Ray namespace in your
# 'kubectl' commands ('ray' by default).
$ kubectl -n ray get pods

# Get a remote screen on the head node.
$ ray attach ray/python/ray/autoscaler/kubernetes/example-full.yaml
$ # Try running a Ray program with 'ray.init(address="auto")'.

# Tear down the cluster
$ ray down ray/python/ray/autoscaler/kubernetes/example-full.yaml

Tip

This section describes the easiest way to launch a Ray cluster on Kubernetes. See this document for advanced usage of Kubernetes with Ray.

Tip

If you would like to use Ray Tune in your Kubernetes cluster, have a look at this short guide to make it work.

Local On Premise Cluster (List of nodes)

You would use this mode if you want to run distributed Ray applications on some local nodes available on premise.

The most preferable way to run a Ray cluster on a private cluster of hosts is via the Ray Cluster Launcher.

There are two ways of running private clusters:

  • Manually managed, i.e., the user explicitly specifies the head and worker ips.

  • Automatically managed, i.e., the user only specifies a coordinator address to a coordinating server that automatically coordinates its head and worker ips.

Tip

To avoid getting the password prompt when running private clusters make sure to setup your ssh keys on the private cluster as follows:

$ ssh-keygen
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

You can get started by filling out the fields in the provided ray/python/ray/autoscaler/local/example-full.yaml. Be sure to specify the proper head_ip, list of worker_ips, and the ssh_user field.

Test that it works by running the following commands from your local machine:

# Create or update the cluster. When the command finishes, it will print
# out the command that can be used to get a remote shell into the head node.
$ ray up ray/python/ray/autoscaler/local/example-full.yaml

# Get a remote screen on the head node.
$ ray attach ray/python/ray/autoscaler/local/example-full.yaml
$ # Try running a Ray program with 'ray.init(address="auto")'.

# Tear down the cluster
$ ray down ray/python/ray/autoscaler/local/example-full.yaml

Additional Cloud Providers

To use Ray autoscaling on other Cloud providers or cluster management systems, you can implement the NodeProvider interface (100 LOC) and register it in node_provider.py. Contributions are welcome!

Security

On cloud providers, nodes will be launched into their own security group by default, with traffic allowed only between nodes in the same group. A new SSH key will also be created and saved to your local machine for access to the cluster.

What’s Next?

Now that you have a working understanding of the cluster launcher, check out:

Questions or Issues?

You can post questions or issues or feedback through the following channels:

  1. Discussion Board: For questions about Ray usage or feature requests.

  2. GitHub Issues: For bug reports.

  3. Ray Slack: For getting in touch with Ray maintainers.

  4. StackOverflow: Use the [ray] tag questions about Ray.