Updating Applications In-Place#

You can update your Serve applications once they’re in production by updating the settings in your config file and redeploying it using the serve deploy command. In the redeployed config file, you can add new deployment settings or remove old deployment settings. This is because serve deploy is idempotent, meaning your Serve application’s config always matches (or honors) the latest config you deployed successfully – regardless of what config files you deployed before that.

Lightweight Config Updates#

Lightweight config updates modify running deployment replicas without tearing them down and restarting them, so there’s less downtime as the deployments update. For each deployment, modifying the following values is considered a lightweight config update, and won’t tear down the replicas for that deployment:

num_replicas
autoscaling_config
user_config
max_ongoing_requests
graceful_shutdown_timeout_s
graceful_shutdown_wait_loop_s
health_check_period_s
health_check_timeout_s

Updating the user config#

This example uses the text summarization and translation application from the production guide. Both of the individual deployments contain a reconfigure() method. This method allows you to issue lightweight updates to the deployments by updating the user_config.

First let’s deploy the graph. Make sure to stop any previous Ray cluster using the CLI command ray stop for this example:

$ ray start --head
$ serve deploy serve_config.yaml

Then send a request to the application:

import requests

english_text = (
    "It was the best of times, it was the worst of times, it was the age "
    "of wisdom, it was the age of foolishness, it was the epoch of belief"
)
response = requests.post("http://127.0.0.1:8000/", json=english_text)
french_text = response.text

print(french_text)
# 'c'était le meilleur des temps, c'était le pire des temps .'

Change the language that the text is translated into from French to German by changing the language attribute in the Translator user config:

...

applications:
- name: default
  route_prefix: /
  import_path: text_ml:app
  runtime_env:
    pip:
      - torch
      - transformers
  deployments:
  - name: Translator
    num_replicas: 1
    user_config:
      language: german

...

Without stopping the Ray cluster, redeploy the app using serve deploy:

$ serve deploy serve_config.yaml
...

We can inspect our deployments with serve status. Once the application’s status returns to RUNNING, we can try our request one more time:

$ serve status
proxies:
  cef533a072b0f03bf92a6b98cb4eb9153b7b7c7b7f15954feb2f38ec: HEALTHY
applications:
  default:
    status: RUNNING
    message: ''
    last_deployed_time_s: 1694041157.2211847
    deployments:
      Translator:
        status: HEALTHY
        replica_states:
          RUNNING: 1
        message: ''
      Summarizer:
        status: HEALTHY
        replica_states:
          RUNNING: 1
        message: ''

The language has updated. Now the returned text is in German instead of French.

import requests

english_text = (
    "It was the best of times, it was the worst of times, it was the age "
    "of wisdom, it was the age of foolishness, it was the epoch of belief"
)
response = requests.post("http://127.0.0.1:8000/", json=english_text)
german_text = response.text

print(german_text)
# 'Es war die beste Zeit, es war die schlimmste Zeit .'

Code Updates#

Changing the following values in a deployment’s config will trigger redeployment and restart all the deployment’s replicas.

ray_actor_options
placement_group_bundles
placement_group_strategy

Changing the following application-level config values is also considered a code update, and all deployments in the application will be restarted.

import_path
runtime_env

Warning

Although you can update your Serve application by deploying an entirely new deployment graph using a different import_path and a different runtime_env, this is NOT recommended in production.

The best practice for large-scale code updates is to start a new Ray cluster, deploy the updated code to it using serve deploy, and then switch traffic from your old cluster to the new one.