Specify container commands for Ray head/worker Pods#

You can execute commands on the head/worker pods at two timings:

  • (1) Before ray start: As an example, you can set up some environment variables that will be used by ray start.

  • (2) After ray start (RayCluster is ready): As an example, you can launch a Ray serve deployment when the RayCluster is ready.

Current KubeRay operator behavior for container commands#

  • The current behavior for container commands is not finalized, and may be updated in the future.

  • See code for more details.

Timing 1: Before ray start#

Currently, for timing (1), we can set the container’s Command and Args in RayCluster specification to reach the goal.

# https://github.com/ray-project/kuberay/ray-operator/config/samples/ray-cluster.head-command.yaml
    rayStartParams:
        ...
    #pod template
    template:
      spec:
        containers:
        - name: ray-head
          image: rayproject/ray:2.5.0
          resources:
            ...
          ports:
            ...
          # `command` and `args` will become a part of `spec.containers.0.args` in the head Pod.
          command: ["echo 123"]
          args: ["456"]
  • Ray head Pod

    • spec.containers.0.command is hardcoded with ["/bin/bash", "-lc", "--"].

    • spec.containers.0.args contains two parts:

      • (Part 1) user-specified command: A string concatenates headGroupSpec.template.spec.containers.0.command from RayCluster and headGroupSpec.template.spec.containers.0.args from RayCluster together.

      • (Part 2) ray start command: The command is created based on rayStartParams specified in RayCluster. The command will look like ulimit -n 65536; ray start ....

      • To summarize, spec.containers.0.args will be $(user-specified command) && $(ray start command).

  • Example

    # Prerequisite: There is a KubeRay operator in the Kubernetes cluster.
    
    # Download `ray-cluster.head-command.yaml`
    curl -LO https://raw.githubusercontent.com/ray-project/kuberay/v1.0.0/ray-operator/config/samples/ray-cluster.head-command.yaml
    
    # Create a RayCluster
    kubectl apply -f ray-cluster.head-command.yaml
    
    # Check ${RAYCLUSTER_HEAD_POD}
    kubectl get pod -l ray.io/node-type=head
    
    # Check `spec.containers.0.command` and `spec.containers.0.args`.
    kubectl describe pod ${RAYCLUSTER_HEAD_POD}
    
    # Command:
    #   /bin/bash
    #   -lc
    #   --
    # Args:
    #    echo 123  456  && ulimit -n 65536; ray start --head  --dashboard-host=0.0.0.0  --num-cpus=1  --block  --metrics-export-port=8080  --memory=2147483648
    

Timing 2: After ray start (RayCluster is ready)#

We have two solutions to execute commands after the RayCluster is ready. The main difference between these two solutions is users can check the logs via kubectl logs with Solution 1.

Solution 2: postStart hook#

# https://github.com/ray-project/kuberay/ray-operator/config/samples/ray-cluster.head-command.yaml
lifecycle:
  postStart:
    exec:
      command: ["/bin/sh","-c","/home/ray/samples/ray_cluster_resources.sh"]
  • We execute the script ray_cluster_resources.sh via the postStart hook. Based on this document, there is no guarantee that the hook will execute before the container ENTRYPOINT. Hence, we need to wait for RayCluster to finish initialization in ray_cluster_resources.sh.

  • Example

    kubectl apply -f ray-cluster.head-command.yaml
    
    # Check ${RAYCLUSTER_HEAD_POD}
    kubectl get pod -l ray.io/node-type=head
    
    # Forward the port of Dashboard
    kubectl port-forward --address 0.0.0.0 ${RAYCLUSTER_HEAD_POD} 8265:8265
    
    # Open the browser and check the Dashboard (${YOUR_IP}:8265/#/job).
    # You shold see a SUCCEEDED job with the following Entrypoint:
    #
    # `python -c "import ray; ray.init(); print(ray.cluster_resources())"`