Configuring KubeRay to use Google Cloud Storage Buckets in GKE#

If you are already familiar with Workload Identity in GKE, you can skip this document. The gist is that you need to specify a service account in each of the Ray pods after linking your Kubernetes service account to your Google Cloud service account. Otherwise, read on.

This example is an abridged version of the documentation at https://cloud.google.com/kubernetes-engine/docs/how-to/workload-identity. The full documentation is worth reading if you are interested in the details.

Create a Kubernetes cluster on GKE#

This example creates a minimal KubeRay cluster using GKE.

Run this and all following commands on your local machine or on the Google Cloud Shell. If running from your local machine, install the Google Cloud SDK.

PROJECT_ID=my-project-id # Replace my-project-id with your GCP project ID
CLUSTER_NAME=cloud-bucket-cluster
ZONE=us-west1-b

gcloud container clusters create $CLUSTER_NAME \
    --addons=RayOperator \
    --num-nodes=1 --min-nodes 0 --max-nodes 1 --enable-autoscaling \
    --zone=$ZONE --machine-type e2-standard-8 \
    --workload-pool=${PROJECT_ID}.svc.id.goog 

This command creates a Kubernetes cluster named cloud-bucket-cluster with one node in the us-west1-b zone. This example uses the e2-standard-8 machine type, which has 8 vCPUs and 32 GB RAM.

For more information on how to find your project ID, see https://support.google.com/googleapi/answer/7014113?hl=en or https://cloud.google.com/resource-manager/docs/creating-managing-projects.

Now get credentials for the cluster to use with kubectl:

gcloud container clusters get-credentials $CLUSTER_NAME --zone $ZONE --project $PROJECT_ID

Create a Kubernetes Service Account#

NAMESPACE=default
KSA=my-ksa
kubectl create serviceaccount $KSA -n $NAMESPACE

Configure the GCS Bucket#

Create a GCS bucket that Ray uses as the remote filesystem.

BUCKET=my-bucket
gcloud storage buckets create gs://$BUCKET --uniform-bucket-level-access

Bind the roles/storage.objectUser role to the Kubernetes service account and bucket IAM policy. See Identifying projects to find your project ID and project number:

PROJECT_ID=<your project ID>
PROJECT_NUMBER=<your project number>
gcloud storage buckets add-iam-policy-binding gs://${BUCKET} --member "principal://iam.googleapis.com/projects/${PROJECT_NUMBER}/locations/global/workloadIdentityPools/${PROJECT_ID}.svc.id.goog/subject/ns/${NAMESPACE}/sa/${KSA}"  --role "roles/storage.objectUser"

See Authenticate to Google Cloud APIs from GKE workloads for more details.

Create a minimal RayCluster YAML manifest#

You can download the RayCluster YAML manifest for this tutorial with curl as follows:

curl -LO https://raw.githubusercontent.com/ray-project/kuberay/v1.5.1/ray-operator/config/samples/ray-cluster.gke-bucket.yaml

The key parts are the following lines:

      spec:
        serviceAccountName: my-ksa
        nodeSelector:
          iam.gke.io/gke-metadata-server-enabled: "true"

Include these lines in every pod spec of your Ray cluster. This example uses a single-node cluster (1 head node and 0 worker nodes) for simplicity.

Create the RayCluster#

kubectl apply -f ray-cluster.gke-bucket.yaml

Test GCS bucket access from the RayCluster#

Use kubectl get pod to get the name of the Ray head pod. Then run the following command to get a shell in the Ray head pod:

kubectl exec -it raycluster-mini-head-xxxx -- /bin/bash

In the shell, run pip install google-cloud-storage to install the Google Cloud Storage Python client library.

(For production use cases, you will need to make sure google-cloud-storage is installed on every node of your cluster, or use ray.init(runtime_env={"pip": ["google-cloud-storage"]}) to have the package installed as needed at runtime – see https://docs.ray.io/en/latest/ray-core/handling-dependencies.html#runtime-environments for more details.)

Then run the following Python code to test access to the bucket:

import ray
import os
from google.cloud import storage

GCP_GCS_BUCKET = "my-bucket"
GCP_GCS_FILE = "test_file.txt"

ray.init(address="auto")

@ray.remote
def check_gcs_read_write():
    client = storage.Client()
    bucket = client.bucket(GCP_GCS_BUCKET)
    blob = bucket.blob(GCP_GCS_FILE)

    # Write to the bucket
    blob.upload_from_string("Hello, Ray on GKE!")

    # Read from the bucket
    content = blob.download_as_text()

    return content

result = ray.get(check_gcs_read_write.remote())
print(result)

You should see the following output:

Hello, Ray on GKE!