Production Guide

The recommended way to run Ray Serve in production is on Kubernetes using the KubeRay RayService custom resource. The RayService custom resource automatically handles important production requirements such as health checking, status reporting, failure recovery, and upgrades. If you’re not running on Kubernetes, you can also run Ray Serve on a Ray cluster directly using the Serve CLI.

This section will walk you through a quickstart of how to generate a Serve config file and deploy it using the Serve CLI. For more details, you can check out the other pages in the production guide:

Working example: FruitStand application

Throughout the production guide, we will use the following Serve application as a working example. The application takes in requests containing a list of two values, a fruit name and an amount, and returns the total price for the batch of fruits.

# File name: fruit.py

import ray
from ray import serve
from ray.serve.drivers import DAGDriver
from ray.serve.deployment_graph import InputNode
from ray.serve.handle import RayServeDeploymentHandle
from ray.serve.http_adapters import json_request

# These imports are used only for type hints:
from typing import Dict, List
from starlette.requests import Request


@serve.deployment(num_replicas=2)
class FruitMarket:
    def __init__(
        self,
        mango_stand: RayServeDeploymentHandle,
        orange_stand: RayServeDeploymentHandle,
        pear_stand: RayServeDeploymentHandle,
    ):
        self.directory = {
            "MANGO": mango_stand,
            "ORANGE": orange_stand,
            "PEAR": pear_stand,
        }

    async def check_price(self, fruit: str, amount: float) -> float:
        if fruit not in self.directory:
            return -1
        else:
            fruit_stand = self.directory[fruit]
            ref: ray.ObjectRef = await fruit_stand.check_price.remote(amount)
            result = await ref
            return result


@serve.deployment(user_config={"price": 3})
class MangoStand:

    DEFAULT_PRICE = 1

    def __init__(self):
        # This default price is overwritten by the one specified in the
        # user_config through the reconfigure() method.
        self.price = self.DEFAULT_PRICE

    def reconfigure(self, config: Dict):
        self.price = config.get("price", self.DEFAULT_PRICE)

    def check_price(self, amount: float) -> float:
        return self.price * amount


@serve.deployment(user_config={"price": 2})
class OrangeStand:

    DEFAULT_PRICE = 0.5

    def __init__(self):
        # This default price is overwritten by the one specified in the
        # user_config through the reconfigure() method.
        self.price = self.DEFAULT_PRICE

    def reconfigure(self, config: Dict):
        self.price = config.get("price", self.DEFAULT_PRICE)

    def check_price(self, amount: float) -> float:
        return self.price * amount


@serve.deployment(user_config={"price": 4})
class PearStand:

    DEFAULT_PRICE = 0.75

    def __init__(self):
        # This default price is overwritten by the one specified in the
        # user_config through the reconfigure() method.
        self.price = self.DEFAULT_PRICE

    def reconfigure(self, config: Dict):
        self.price = config.get("price", self.DEFAULT_PRICE)

    def check_price(self, amount: float) -> float:
        return self.price * amount


async def json_resolver(request: Request) -> List:
    return await request.json()


with InputNode() as query:
    fruit, amount = query[0], query[1]

    mango_stand = MangoStand.bind()
    orange_stand = OrangeStand.bind()
    pear_stand = PearStand.bind()

    fruit_market = FruitMarket.bind(mango_stand, orange_stand, pear_stand)

    net_price = fruit_market.check_price.bind(fruit, amount)

deployment_graph = DAGDriver.bind(net_price, http_adapter=json_request)

Save this code locally in fruit.py to follow along. In development, we would likely use the serve run command to iteratively run, develop, and repeat (see the Development Workflow for more information). When we’re ready to go to production, we will generate a structured config file that acts as the single source of truth for the application.

This config file can be generated using serve build:

$ serve build fruit:deployment_graph -o fruit_config.yaml

The generated version of this file contains an import_path, runtime_env, and configuration options for each deployment in the application. A minimal version of the config looks as follows (save this config locally in fruit_config.yaml to follow along):

import_path: fruit:deployment_graph

runtime_env: {}

deployments:

- name: MangoStand
  num_replicas: 2

- name: OrangeStand
  num_replicas: 1

- name: PearStand
  num_replicas: 1

- name: FruitMarket
  num_replicas: 2

- name: DAGDriver
  num_replicas: 1

You can use serve deploy to deploy the application to a local Ray cluster and serve status to get the status at runtime:

# Start a local Ray cluster.
ray start --head

# Deploy the FruitStand application to the local Ray cluster.
serve deploy fruit_config.yaml
2022-08-16 12:51:22,043 SUCC scripts.py:180 --
Sent deploy request successfully!
 * Use `serve status` to check deployments' statuses.
 * Use `serve config` to see the running app's config.

$ serve status
app_status:
  status: RUNNING
  message: ''
  deployment_timestamp: 1660672282.0406542
deployment_statuses:
- name: MangoStand
  status: HEALTHY
  message: ''
- name: OrangeStand
  status: HEALTHY
  message: ''
- name: PearStand
  status: HEALTHY
  message: ''
- name: FruitMarket
  status: HEALTHY
  message: ''
- name: DAGDriver
  status: HEALTHY
  message: ''

You can test the application using curl:

$ curl -H "Content-Type: application/json" -d '["PEAR", 2]' "http://localhost:8000/"
8

To update the application, modify the config file and use serve deploy again.

Next Steps

This section provided a quickstart on how to generate and use a Serve config file. For a deeper dive into how to deploy, update, and monitor Serve applications, see the following pages: