KubeRay vs. the Legacy Ray Operator

Using the KubeRay operator is the preferred way to deploy Ray on Kubernetes. This page compares the KubeRay operator to the legacy Ray Operator hosted in the Ray repo. This page also provides migration notes for users switching to the KubeRay operator.

KubeRay vs. Legacy Ray Operator: Similarities

Purpose

The two operators have the same purpose: managing clusters of Ray pods deployed on Kubernetes.

High-level interface structure

Both operators rely on a user-specified custom resource specifying Ray pod configuration and Ray worker pod quantities.

KubeRay vs. Legacy Ray Operator: Differences

The two operators differ primarily in internal design and implementation. There are also some differences in configuration details.

Implementation and architecture

Legacy Ray Operator The legacy Ray Operator is implemented in Python. Kubernetes event handling is implemented using the Kopf framework. The operator invokes Ray cluster launcher and autoscaler code to manage Ray clusters. The operator forks an autoscaler subprocess for each Ray cluster it manages. The Ray autoscaler subprocesses create and delete Ray pods directly.

KubeRay Operator The KubeRay operator is implemented in Golang using standard tools for building Kubernetes operators, including the KubeBuilder operator framework and the client-go client library. The KubeRay operator is structurally simpler than the legacy Ray Operator; rather than running many Ray autoscalers in subprocesses, the KubeRay operator implements a simple reconciliation loop. The reconciliation loop creates and deletes Ray pods to match the desired state expressed in each RayCluster CR. Each Ray cluster runs its own autoscaler as a sidecar to the Ray head pod. The Ray autoscaler communicates desired scale to the KubeRay operator by writing to the RayCluster custom resource.

Scalability

The KubeRay operator is more scalable than the legacy Ray Operator. Specifically, the KubeRay operator can simultaneously manage more Ray clusters.

Legacy Ray Operator Each Ray autoscaler consumes nontrivial memory and CPU resources. Since the legacy Ray Operator runs many autoscalers in one pod, it cannot manage many Ray clusters.

KubeRay Operator The KubeRay operator does not run Ray autoscaler processes. Each Ray autoscaler runs as a sidecar to the Ray head. Since managing each Ray cluster is cheap, the KubeRay operator can manage many Ray clusters.

Ray version compatibility

Legacy Ray Operator It is recommended to use the same Ray version in the legacy Ray operator as in all of the Ray pods managed by the operator. Matching Ray versions is required to maintain compatibility between autoscaler code running in the operator pod and Ray code running in the Ray cluster.

KubeRay Operator The KubeRay operator is compatible with many Ray versions. Compatibility of KubeRay v0.3.0 with Ray versions 1.11, 1.12, 1.13, and 2.0 is tested explicitly.

Note however that autoscaling with KubeRay is supported only with Ray versions at least as new as 1.11.0.

Configuration details

Some details of Ray cluster configuration are different with KubeRay; see the next section for migration notes. Refer to the configuration guide for comprehensive information.

Migration Notes

Take note of the following configuration differences when switching to KubeRay deployment.

Autoscaling is optional

Ray Autoscaler support is optional with KubeRay. Set spec.enableInTreeAutoscaling:true in the RayCluster CR to enable autoscaling. The KubeRay operator will then automatically configure a Ray Autoscaler sidecar for the Ray head pod. The autoscaler container requests 500m CPU and 512Mi memory by default. Autoscaler container configuration is accessible via spec.autoscalerOptions. Note that autoscaling with KubeRay is supported only with Ray versions at least as new 1.11.

No need to specify /dev/shm

The KubeRay operator automatically configures a /dev/shm volume for each Ray pod’s object store. There is no need to specify this volume in the RayCluster CR.

Namespace-scoped operation.

Similar to the legacy Ray Operator, it is possible to run the KubeRay operator at single-namespace scope. See the KubeRay documentation for details.

Note that the KubeRay operator can manage many Ray clusters running at different Ray versions. Thus, from a scalability and compatibility perspective, there is no need to run one KubeRay operator per Kubernetes namespace. Run a namespace-scoped KubeRay operator only if necessary, e.g. to accommodate permissions constraints in your Kubernetes cluster.

Specifying resource quantities.

Ray pod CPU, GPU, and memory capacities are detected from container resource limits and advertised to Ray.

The interface for overriding the resource capacities advertised to Ray is different: Resource overrides must be specified in rayStartParams. For example, you may wish to prevent the Ray head pod from running Ray workloads by labelling the head as having 0 CPU capacity. To achieve this with KubeRay, include the following in the headGroupSpec’s configuration:

rayStartParams:
    num-cpus: "0"

To advertise custom resource capacities to Ray, one uses the field rayStartParams.resources. See the configuration guide for details.

Ray Version

The Ray version (e.g. 2.0.0) should be supplied under the RayCluster CR’s spec.rayVersion. See the configuration guide for details.

Init Containers and Pre-Stop hooks

There are two pieces of configuration that should be included in all KubeRay RayCluster CRs.

  • Worker pods need an init container that awaits creation of the Ray head service.

  • Ray containers for the Ray head and worker should include a preStop hook with a ray stop command. While future versions of KubeRay may inject this configuration automatically, currently these elements must be included in all RayCluster CRs. See the configuration guide for details.

Migration: Example

This section presents an example of the migration process. Specifically, we translate a Helm values.yaml configuration for the legacy Ray Operator into an example RayCluster CR for KubeRay. We also recommend taking a look at example RayCluster CRs in the Ray docs and in the KubeRay docs.

Legacy Ray Operator values.yaml

Here is a values.yaml for the legacy Ray Operator’s Helm chart which specifies a Ray cluster with the following features

  • A head pod annotated with a "CPU":0 override to prevent scheduling Ray workloads on the head.

  • A CPU worker group annotated with custom resource capacities.

  • A GPU worker group.

image: rayproject/ray-ml:2.0.0-gpu
headPodType: rayHeadType
podTypes:
    rayHeadType:
        CPU: 14
        memory: 54Gi
        # Annotate the head pod as having 0 CPU
        # to prevent the head pod from scheduling Ray workloads.
        rayResources: {"CPU": 0}
    rayCPUWorkerType:
        # Start with 2 CPU workers. Allow scaling up to 3 CPU workers.
        minWorkers: 2
        maxWorkers: 3
        memory: 54Gi
        CPU: 14
        # Annotate the Ray worker pod as having 1 unit of Custom capacity and 5 units of "Custom2" capacity
        rayResources: {"Custom": 1, "Custom2": 5}
    rayGPUWorkerType:
        minWorkers: 0
        maxWorkers: 5
        CPU: 3
        GPU: 1
        memory: 50Gi

operatorImage: rayproject/ray:2.0.0

KubeRay RayCluster CR

In this section, we show a KubeRay RayCluster CR equivalent to the above legacy Ray Operator Helm configuration.

Note

The configuration below is more verbose, as it does not employ Helm. Helm support for KubeRay is in progress; to try it, out read KubeRay’s Helm docs. KubeRay’s Helm charts can be found on GitHub here.

Currently, we recommend directly deploying KubeRay RayCluster CRs without Helm.

Here is a link to the configuration shown below.

apiVersion: ray.io/v1alpha1
kind: RayCluster
metadata:
  labels:
    controller-tools.k8s.io: "1.0"
  name: raycluster-example
spec:
  # To use autoscaling, the following field must be included.
  enableInTreeAutoscaling: true
  # The Ray version must be supplied.
  rayVersion: '2.0.0'
  headGroupSpec:
    serviceType: ClusterIP
    rayStartParams:
      dashboard-host: '0.0.0.0'
      block: 'true'
      # Annotate the head pod as having 0 CPU
      # to prevent the head pod from scheduling Ray workloads.
      num-cpus: 0
    template:
      spec:
        containers:
        - name: ray-head
          image: rayproject/ray-ml:2.0.0-gpu
          resources:
            limits:
              cpu: "14"
              memory: "54Gi"
            requests:
              cpu: "14"
              memory: "54Gi"
          # Keep this in container configs.
          lifecycle:
            preStop:
              exec:
                command: ["/bin/sh","-c","ray stop"]
  workerGroupSpecs:
  # Start with 2 CPU workers. Allow scaling up to 3 CPU workers.
  - replicas: 2
    minReplicas: 2
    maxReplicas: 3
    groupName: rayCPUWorkerType
    rayStartParams:
      block: 'true'
      # Annotate the Ray worker pod as having 1 unit of "Custom" capacity and 5 units of "Custom2" capacity
      resources: '"{\"Custom\": 1, \"Custom2\": 5}"'
    template:
      spec:
        containers:
        - name: ray-worker
          image: rayproject/ray-ml:2.0.0-gpu
          resources:
            limits:
              cpu: "14"
              memory: "54Gi"
            requests:
              cpu: "14"
              memory: "54Gi"
          # Keep the lifecycle block in Ray container configs.
          lifecycle:
            preStop:
              exec:
                command: ["/bin/sh","-c","ray stop"]
        # Keep the initContainers block in worker pod configs.
        initContainers:
        - name: init-myservice
          image: busybox:1.28
          command: ['sh', '-c', "until nslookup $RAY_IP.$(cat /var/run/secrets/kubernetes.io/serviceaccount/namespace).svc.cluster.local; do echo waiting for myservice; sleep 2; done"]
  # Start with 0 GPU workers. Allow scaling up to 5 GPU workers.
  - replicas: 0
    minReplicas: 0
    maxReplicas: 5
    groupName: rayGPUWorkerType
    rayStartParams:
      block: 'true'
    template:
      spec:
        containers:
        - name: ray-worker
          image: rayproject/ray-ml:2.0.0-gpu
          resources:
            limits:
              cpu: "3"
              memory: "50Gi"
              nvidia.com/gpu: 1
            requests:
              cpu: "3"
              memory: "50Gi"
              nvidia.com/gpu: 1
          # Keep the lifecycle block in Ray container configs.
          lifecycle:
            preStop:
              exec:
                command: ["/bin/sh","-c","ray stop"]
        # Keep the initContainers block in worker pod configs.
        initContainers:
        - name: init-myservice
          image: busybox:1.28
          command: ['sh', '-c', "until nslookup $RAY_IP.$(cat /var/run/secrets/kubernetes.io/serviceaccount/namespace).svc.cluster.local; do echo waiting for myservice; sleep 2; done"]
# Operator configuration is not specified here -- the KubeRay operator should be deployed before creating Ray clusters.