Key Concepts

Cluster

A Ray cluster is a set of one or more nodes that are running Ray and share the same head node.

Node types

A Ray cluster consists of a head node and a set of worker nodes.

../_images/ray-cluster.jpg

Head node

The head node is the first node started by the Ray cluster launcher when trying to launch a Ray cluster. Among other things, the head node holds the Global Control Store (GCS) and runs the autoscaler. Once the head node is started, it will be responsible for launching any additional worker nodes. The head node itself will also execute tasks and actors to utilize its capacity.

Worker node

A worker node is any node in the Ray cluster that is not functioning as head node. Therefore, worker nodes are simply responsible for executing tasks and actors. When a worker node is launched, it will be given the address of the head node to form a cluster.

Cluster launcher

The cluster launcher is a process responsible for bootstrapping the Ray cluster by launching the head node. For more information on how to use the cluster launcher, refer to cluster launcher CLI commands documentation and the corresponding documentation for the configuration file.

Autoscaler

The autoscaler is a process that runs on the head node and is responsible for adding or removing worker nodes to meet the needs of the Ray workload while matching the specification in the cluster config file. In particular, if the resource demands of the Ray workload exceed the current capacity of the cluster, the autoscaler will try to add nodes. Conversely, if a node is idle for long enough, the autoscaler will remove it from the cluster. To learn more about autoscaling, refer to the Ray cluster deployment guide.

Ray Client

The Ray Client is an API that connects a Python script to a remote Ray cluster. To learn more about the Ray Client, you can refer to the documentation.

Job submission

Ray Job submission is a mechanism to submit locally developed and tested applications to a remote Ray cluster. It simplifies the experience of packaging, deploying, and managing a Ray application. To learn more about Ray jobs, refer to the documentation.

Cloud clusters

If you’re using AWS, GCP, Azure (community-maintained) or Aliyun (community-maintained), you can use the Ray cluster launcher to launch cloud clusters, which greatly simplifies the cluster setup process.

Cluster managers

You can simplify the process of managing Ray clusters using a number of popular cluster managers including Kubernetes, YARN, Slurm and LSF.

Kubernetes (K8s) operator

Deployments of Ray on Kubernetes are managed by the Ray Kubernetes Operator. The Ray Operator makes it easy to deploy clusters of Ray pods within a Kubernetes cluster. To learn more about the K8s operator, refer to the documentation.