Deploy Multiple Applications#

Serve supports deploying multiple independent Serve applications. This user guide walks through how to generate a multi-application config file and deploy it using the Serve CLI, and monitor your applications using the CLI and the Ray Serve dashboard.

Context#

Background#

With the introduction of multi-application Serve, we walk you through the new concept of applications and when you should choose to deploy a single application versus multiple applications per cluster.

An application consists of one or more deployments. The deployments in an application are tied into a direct acyclic graph through model composition. An application can be called via HTTP at the specified route prefix, and the ingress deployment handles all such inbound traffic. Due to the dependence between deployments in an application, one application is a unit of upgrade.

When to use multiple applications#

You can solve many use cases by using either model composition or multi-application. However, both have their own individual benefits and can be used together.

Suppose you have multiple models and/or business logic that all need to be executed for a single request. If they are living in one repository, then you most likely upgrade them as a unit, so we recommend having all those deployments in one application.

On the other hand, if these models or business logic have logical groups, for example, groups of models that communicate with each other but live in different repositories, we recommend separating the models into applications. Another common use-case for multiple applications is separate groups of models that may not communicate with each other, but you want to co-host them to increase hardware utilization. Because one application is a unit of upgrade, having multiple applications allows you to deploy many independent models (or groups of models) each behind different endpoints. You can then easily add or delete applications from the cluster as well as upgrade applications independently of each other.

Getting started#

Define a Serve application:

import requests
import starlette

from transformers import pipeline
from io import BytesIO
from PIL import Image

from ray import serve
from ray.serve.handle import DeploymentHandle


@serve.deployment
def downloader(image_url: str):
    image_bytes = requests.get(image_url).content
    image = Image.open(BytesIO(image_bytes)).convert("RGB")
    return image


@serve.deployment
class ImageClassifier:
    def __init__(self, downloader: DeploymentHandle):
        self.downloader = downloader
        self.model = pipeline(
            "image-classification", model="google/vit-base-patch16-224"
        )

    async def classify(self, image_url: str) -> str:
        image = await self.downloader.remote(image_url)
        results = self.model(image)
        return results[0]["label"]

    async def __call__(self, req: starlette.requests.Request):
        req = await req.json()
        return await self.classify(req["image_url"])


app = ImageClassifier.options(route_prefix="/classify").bind(downloader.bind())

Copy this code to a file named image_classifier.py.

Define a second Serve application:

import starlette

from transformers import pipeline

from ray import serve


@serve.deployment
class Translator:
    def __init__(self):
        self.model = pipeline("translation_en_to_de", model="t5-small")

    def translate(self, text: str) -> str:
        return self.model(text)[0]["translation_text"]

    async def __call__(self, req: starlette.requests.Request):
        req = await req.json()
        return self.translate(req["text"])


app = Translator.options(route_prefix="/translate").bind()

Copy this code to a file named text_translator.py.

Generate a multi-application config file that contains both of these two applications and save it to config.yaml.

serve build --multi-app image_classifier:app text_translator:app -o config.yaml

This generates the following config:

proxy_location: EveryNode

http_options:
  host: 0.0.0.0
  port: 8000

grpc_options:
  port: 9000
  grpc_servicer_functions: []

applications:
- name: app1
  route_prefix: /classify
  import_path: image_classifier:app
  runtime_env: {}
  deployments:
  - name: downloader
  - name: ImageClassifier

- name: app2
  route_prefix: /translate
  import_path: text_translator:app
  runtime_env: {}
  deployments:
  - name: Translator

Note

The names for each application are auto-generated as app1, app2, etc. To give custom names to the applications, modify the config file before moving on to the next step.

Deploy the applications#

To deploy the applications, be sure to start a Ray cluster first.

$ ray start --head

$ serve deploy config.yaml
> Sent deploy request successfully!

Query the applications at their respective endpoints, /classify and /translate.

bear_url = "https://cdn.britannica.com/41/156441-050-A4424AEC/Grizzly-bear-Jasper-National-Park-Canada-Alberta.jpg"  # noqa
resp = requests.post("http://localhost:8000/classify", json={"image_url": bear_url})

print(resp.text)
# 'brown bear, bruin, Ursus arctos'
text = "Hello, the weather is quite fine today!"
resp = requests.post("http://localhost:8000/translate", json={"text": text})

print(resp.text)
# 'Hallo, das Wetter ist heute ziemlich gut!'

Development workflow with serve run#

You can also use the CLI command serve run to run and test your application easily, either locally or on a remote cluster.

$ serve run config.yaml
> 2023-04-04 11:00:05,901 INFO scripts.py:327 -- Deploying from config file: "config.yaml".
> 2023-04-04 11:00:07,505 INFO worker.py:1613 -- Started a local Ray instance. View the dashboard at http://127.0.0.1:8265
> 2023-04-04 11:00:09,012 SUCC scripts.py:393 -- Submitted deploy config successfully.

The serve run command blocks the terminal, which allows logs from Serve to stream to the console. This helps you test and debug your applications easily. If you want to change your code, you can hit Ctrl-C to interrupt the command and shutdown Serve and all its applications, then rerun serve run.

Note

serve run only supports running multi-application config files. If you want to run applications by directly passing in an import path, serve run can only run one application import path at a time.

Check status#

Check the status of the applications by running serve status.

$ serve status
proxies:
  2e02a03ad64b3f3810b0dd6c3265c8a00ac36c13b2b0937cbf1ef153: HEALTHY
applications:
  app1:
    status: RUNNING
    message: ''
    last_deployed_time_s: 1693267064.0735464
    deployments:
      downloader:
        status: HEALTHY
        replica_states:
          RUNNING: 1
        message: ''
      ImageClassifier:
        status: HEALTHY
        replica_states:
          RUNNING: 1
        message: ''
  app2:
    status: RUNNING
    message: ''
    last_deployed_time_s: 1693267064.0735464
    deployments:
      Translator:
        status: HEALTHY
        replica_states:
          RUNNING: 1
        message: ''

Send requests between applications#

You can also make calls between applications without going through HTTP by using the Serve API serve.get_app_handle to get a handle to any live Serve application on the cluster. This handle can be used to directly execute a request on an application. Take the classifier and translator app above as an example. You can modify the __call__ method of the ImageClassifier to check for another parameter in the HTTP request, and send requests to the translator application.

    async def __call__(self, req: starlette.requests.Request):
        req = await req.json()
        result = await self.classify(req["image_url"])

        if req.get("should_translate") is True:
            handle: DeploymentHandle = serve.get_app_handle("app2")
            return await handle.translate.remote(result)

        return result

Then, send requests to the classifier application with the should_translate flag set to True:

bear_url = "https://cdn.britannica.com/41/156441-050-A4424AEC/Grizzly-bear-Jasper-National-Park-Canada-Alberta.jpg"  # noqa
resp = requests.post(
    "http://localhost:8000/classify",
    json={"image_url": bear_url, "should_translate": True},
)

print(resp.text)
# 'Braunbär, Bruin, Ursus arctos'

Inspect deeper#

For more visibility into the applications running on the cluster, go to the Ray Serve dashboard at http://localhost:8265/#/serve.

You can see all applications that are deployed on the Ray cluster:

applications

The list of deployments under each application:

deployments

As well as the list of replicas for each deployment:

replicas

For more details on the Ray Serve dashboard, see the Serve dashboard documentation.

Add, delete, and update applications#

You can add, remove or update entries under the applications field to add, remove or update applications in the cluster. This doesn’t affect other applications on the cluster. To update an application, modify the config options in the corresponding entry under the applications field.

Note

The in-place update behavior for an application when you resubmit a config is the same as the single-application behavior. For how an application reacts to different config changes, see Updating a Serve Application.

Migrating from a single-application config#

Migrating the single-application config ServeApplicationSchema to the multi-application config format ServeDeploySchema is straightforward. Each entry under the applications field matches the old, single-application config format. To convert a single-application config to the multi-application config format:

  • Copy the entire old config to an entry under the applications field.

  • Remove host and port from the entry and move them under the http_options field.

  • Name the application.

  • If you haven’t already, set the application-level route_prefix to the route prefix of the ingress deployment in the application. In a multi-application config, you should set route prefixes at the application level instead of for the ingress deployment in each application.

  • When needed, add more applications.

For more details on the multi-application config format, see the documentation for ServeDeploySchema.

Note

You must remove host and port from the application entry. In a multi-application config, specifying cluster-level options within an individual application isn’t applicable, and is not supported.