RayCluster Quickstart#
This guide shows you how to manage and interact with Ray clusters on Kubernetes.
Preparation#
Step 1: Create a Kubernetes cluster#
This step creates a local Kubernetes cluster using Kind. If you already have a Kubernetes cluster, you can skip this step.
kind create cluster --image=kindest/node:v1.26.0
Step 2: Deploy a KubeRay operator#
Deploy the KubeRay operator with the Helm chart repository or Kustomize.
helm repo add kuberay https://ray-project.github.io/kuberay-helm/
helm repo update
# Install both CRDs and KubeRay operator v1.2.2.
helm install kuberay-operator kuberay/kuberay-operator --version 1.2.2
# Confirm that the operator is running in the namespace `default`.
kubectl get pods
# NAME READY STATUS RESTARTS AGE
# kuberay-operator-7fbdbf8c89-pt8bk 1/1 Running 0 27s
# Install CRD and KubeRay operator v1.2.2.
kubectl create -k "github.com/ray-project/kuberay/ray-operator/config/default?ref=v1.2.2"
# Confirm that the operator is running in the namespace `ray-system`.
kubectl get pods -n ray-system
# NAME READY STATUS RESTARTS AGE
# kuberay-operator-6d57c9f797-ffvph 1/1 Running 0 2m14s
For further information, see the installation instructions in the KubeRay documentation.
Step 3: Deploy a RayCluster custom resource#
Once the KubeRay operator is running, you’re ready to deploy a RayCluster. Create a RayCluster Custom Resource (CR) in the default
namespace.
# Deploy a sample RayCluster CR from the KubeRay Helm chart repo:
helm install raycluster kuberay/ray-cluster --version 1.2.2 --set 'image.tag=2.9.0-aarch64'
# Deploy a sample RayCluster CR from the KubeRay Helm chart repo:
helm install raycluster kuberay/ray-cluster --version 1.2.2
# Deploy a sample RayCluster CR from the KubeRay repository:
kubectl apply -f "https://raw.githubusercontent.com/ray-project/kuberay/refs/heads/release-1.2.2/ray-operator/config/samples/ray-cluster.sample.yaml"
# Once the RayCluster CR has been created, you can view it by running:
kubectl get rayclusters
# NAME DESIRED WORKERS AVAILABLE WORKERS CPUS MEMORY GPUS STATUS AGE
# raycluster-kuberay 1 1 2 3G 0 ready 95s
The KubeRay operator will detect the RayCluster object. The operator will then start your Ray cluster by creating head and worker pods. To view Ray cluster’s pods, run the following command:
# View the pods in the RayCluster named "raycluster-kuberay"
kubectl get pods --selector=ray.io/cluster=raycluster-kuberay
# NAME READY STATUS RESTARTS AGE
# raycluster-kuberay-head-vkj4n 1/1 Running 0 XXs
# raycluster-kuberay-worker-workergroup-xvfkr 1/1 Running 0 XXs
Wait for the pods to reach Running state. This may take a few minutes – most of this time is spent downloading the Ray images.
If your pods are stuck in the Pending state, you can check for errors via kubectl describe pod raycluster-kuberay-xxxx-xxxxx
and ensure that your Docker resource limits are set high enough.
Note that in production scenarios, you will want to use larger Ray pods. In fact, it is advantageous to size each Ray pod to take up an entire Kubernetes node. See the configuration guide for more details.
Step 4: Run an application on a RayCluster#
Now, let’s interact with the RayCluster we’ve deployed.
Method 1: Execute a Ray job in the head Pod#
The most straightforward way to experiment with your RayCluster is to exec directly into the head pod. First, identify your RayCluster’s head pod:
export HEAD_POD=$(kubectl get pods --selector=ray.io/node-type=head -o custom-columns=POD:metadata.name --no-headers)
echo $HEAD_POD
# raycluster-kuberay-head-vkj4n
# Print the cluster resources.
kubectl exec -it $HEAD_POD -- python -c "import ray; ray.init(); print(ray.cluster_resources())"
# 2023-04-07 10:57:46,472 INFO worker.py:1243 -- Using address 127.0.0.1:6379 set in the environment variable RAY_ADDRESS
# 2023-04-07 10:57:46,472 INFO worker.py:1364 -- Connecting to existing Ray cluster at address: 10.244.0.6:6379...
# 2023-04-07 10:57:46,482 INFO worker.py:1550 -- Connected to Ray cluster. View the dashboard at http://10.244.0.6:8265
# {'object_store_memory': 802572287.0, 'memory': 3000000000.0, 'node:10.244.0.6': 1.0, 'CPU': 2.0, 'node:10.244.0.7': 1.0}
Method 2: Submit a Ray job to the RayCluster via ray job submission SDK#
Unlike Method 1, this method does not require you to execute commands in the Ray head pod. Instead, you can use the Ray job submission SDK to submit Ray jobs to the RayCluster via the Ray Dashboard port (8265 by default) where Ray listens for Job requests. The KubeRay operator configures a Kubernetes service targeting the Ray head Pod.
kubectl get service raycluster-kuberay-head-svc
# NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
# raycluster-kuberay-head-svc ClusterIP 10.96.93.74 <none> 8265/TCP,8080/TCP,8000/TCP,10001/TCP,6379/TCP 15m
Now that we have the name of the service, we can use port-forwarding to access the Ray Dashboard port (8265 by default).
# Execute this in a separate shell.
kubectl port-forward service/raycluster-kuberay-head-svc 8265:8265
Now that we have access to the Dashboard port, we can submit jobs to the RayCluster:
# The following job's logs will show the Ray cluster's total resource capacity, including 2 CPUs.
ray job submit --address http://localhost:8265 -- python -c "import ray; ray.init(); print(ray.cluster_resources())"
Step 5: Access the Ray Dashboard#
Visit ${YOUR_IP}:8265
in your browser for the Dashboard. For example, 127.0.0.1:8265
.
See the job you submitted in Step 4 in the Recent jobs pane as shown below.
Step 6: Cleanup#
# [Step 6.1]: Delete the RayCluster CR
# Uninstall the RayCluster Helm chart
helm uninstall raycluster
# release "raycluster" uninstalled
# Note that it may take several seconds for the Ray pods to be fully terminated.
# Confirm that the RayCluster's pods are gone by running
kubectl get pods
# NAME READY STATUS RESTARTS AGE
# kuberay-operator-7fbdbf8c89-pt8bk 1/1 Running 0 XXm
# [Step 6.2]: Delete the KubeRay operator
# Uninstall the KubeRay operator Helm chart
helm uninstall kuberay-operator
# release "kuberay-operator" uninstalled
# Confirm that the KubeRay operator pod is gone by running
kubectl get pods
# No resources found in default namespace.
# [Step 6.3]: Delete the Kubernetes cluster
kind delete cluster
# [Step 6.1]: Delete the RayCluster CR
kubectl delete -f "https://raw.githubusercontent.com/ray-project/kuberay/refs/heads/release-1.2.2/ray-operator/config/samples/ray-cluster.sample.yaml"
# Confirm that the RayCluster's pods are gone by running
kubectl get pods
# No resources found in default namespace.
# [Step 6.2]: Delete the KubeRay operator
kubectl delete -k "https://github.com/ray-project/kuberay/ray-operator/config/default?ref=v1.2.2"
# Confirm that the KubeRay operator pod is gone by running
kubectl get pods -n ray-system
# No resources found in ray-system namespace.
# [Step 6.3]: Delete the Kubernetes cluster
kind delete cluster