Serve a Stable Diffusion model on GKE with TPUs#

Note: The Python files for the Ray Serve app and its client are in the ray-project/serve_config_examples. This guide adapts the tensorflow/tpu example.

Step 1: Create a Kubernetes cluster with TPUs#

Follow Creating a GKE Cluster with TPUs for KubeRay to create a GKE cluster with 1 CPU node and 1 TPU node.

Step 2: Install the KubeRay operator#

Skip this step if the Ray Operator Addon is enabled in your GKE cluster. Follow Deploy a KubeRay operator instructions to install the latest stable KubeRay operator from the Helm repository. Multi-host TPU support is available in KubeRay v1.1.0+. Note that the YAML file in this example uses serveConfigV2, which KubeRay supports starting from v0.6.0.

Step 3: Install the RayService CR#

# Creates a RayCluster with a single-host v4 TPU worker group of 2x2x1 topology.
kubectl apply -f

KubeRay operator v1.1.0 adds a new NumOfHosts field to the RayCluster CR, supporting multi-host worker groups. This field specifies the number of workers to create per replica, with each replica representing a multi-host Pod slice. The value for NumOfHosts should match the number of TPU VM hosts that the given node selector expects. For this example, the Stable Diffusion model is small enough to run on a single TPU host, so numOfHosts is set to 1 in the RayService manifest.

Step 4: View the Serve deployment in the Ray Dashboard#

Verify that you deployed the RayService CR and it’s running:

kubectl get rayservice

# stable-diffusion-tpu-serve-svc   Running          2

Port-forward the Ray Dashboard from the Ray head service. To view the dashboard, open http://localhost:8265/ on your local machine.

kubectl port-forward svc/stable-diffusion-tpu-head-svc 8265:8265 &

Monitor the status of the RayService CR in the Ray Dashboard from the the ‘Serve’ tab. The installed RayService CR should create a running app with the name ‘stable_diffusion’. The app should have two deployments, the API ingress, which receives input prompts, and the Stable Diffusion model server.


Step 5: Send text-to-image prompts to the model server#

Port forward the Ray Serve service:

kubectl port-forward svc/stable-diffusion-tpu-serve-svc 8000

In a separate terminal, download the Python prompt script:

curl -LO

Install the required dependencies to run the Python script locally:

# Create a Python virtual environment.
python3 -m venv myenv
source myenv/bin/activate

pip install numpy pillow requests tqdm

Submit a text-to-image prompt to the Stable Diffusion model server:

python  --save_pictures
  • The Python prompt script saves the results of the Stable Diffusion inference to a file named diffusion_results.png.
