Serve Config Files (serve build)#

This section should help you:

  • understand the Serve config file format.

  • understand how to generate and update a config file for a Serve application.

This config file can be used with the serve deploy command CLI or embedded in a RayService custom resource in Kubernetes to deploy and update your application in production. The file is written in YAML and has the following format:

import_path: ...

runtime_env: ...

host: ...

port: ...

deployments:

    - name: ...
      num_replicas: ...
      ...

    - name:
      ...

    ...

The file contains the following fields:

  • An import_path, which is the path to your top-level Serve deployment (or the same path passed to serve run). The most minimal config file consists of only an import_path.

  • A runtime_env that defines the environment that the application will run in. This is used to package application dependencies such as pip packages (see Runtime Environments for supported fields). Note that the import_path must be available within the runtime_env if it’s specified.

  • host and port are HTTP options that determine the host IP address and the port for your Serve application’s HTTP proxies. These are optional settings and can be omitted. By default, the host will be set to 0.0.0.0 to expose your deployments publicly, and the port will be set to 8000. If you’re using Kubernetes, setting host to 0.0.0.0 is necessary to expose your deployments outside the cluster.

  • A list of deployments. This is optional and allows you to override the @serve.deployment settings specified in the deployment graph code. Each entry in this list must include the deployment name, which must match one in the code. If this section is omitted, Serve launches all deployments in the graph with the settings specified in the code.

Below is an equivalent config for the FruitStand example:

import_path: fruit:deployment_graph

runtime_env: {}

deployments:

    - name: FruitMarket
      num_replicas: 2

    - name: MangoStand
      user_config:
        price: 3

    - name: OrangeStand
      user_config:
        price: 2

    - name: PearStand
      user_config:
        price: 4

    - name: DAGDriver

The file uses the same fruit:deployment_graph import path that was used with serve run and it has five entries in the deployments list– one for each deployment. All the entries contain a name setting and some other configuration options such as num_replicas or user_config.

Tip

Each individual entry in the deployments list is optional. In the example config file above, we could omit the PearStand, including its name and user_config, and the file would still be valid. When we deploy the file, the PearStand deployment will still be deployed, using the configurations set in the @serve.deployment decorator from the deployment graph’s code.

We can also auto-generate this config file from the code. The serve build command takes an import path to your deployment graph and it creates a config file containing all the deployments and their settings from the graph. You can tweak these settings to manage your deployments in production.

Using the FruitStand deployment graph example:

$ ls
fruit.py

$ serve build fruit:deployment_graph -o fruit_config.yaml

$ ls
fruit.py
fruit_config.yaml

The fruit_config.yaml file contains:

import_path: fruit:deployment_graph

runtime_env: {}

host: 0.0.0.0

port: 8000

deployments:

- name: MangoStand
  user_config:
    price: 3

- name: OrangeStand
  user_config:
    price: 2

- name: PearStand
  user_config:
    price: 4

- name: FruitMarket
  num_replicas: 2

- name: DAGDriver
  route_prefix: /

Note that the runtime_env field will always be empty when using serve build and must be set manually.

Additionally, serve build includes the default host and port in its autogenerated files. You can modify these parameters to select a different host and port.

Tip

You can use the --kubernetes-format/-k flag with serve build to print the Serve config in a format that can be copy-pasted directly into your Kubernetes config.

Overriding deployment settings#

Settings from @serve.deployment can be overriden with this Serve config file. The order of priority is (from highest to lowest):

  1. Config File

  2. Deployment graph code (either through the @serve.deployment decorator or a .set_options() call)

  3. Serve defaults

For example, if a deployment’s num_replicas is specified in the config file and their graph code, Serve will use the config file’s value. If it’s only specified in the code, Serve will use the code value. If the user doesn’t specify it anywhere, Serve will use a default (which is num_replicas=1).

Keep in mind that this override order is applied separately to each individual setting. For example, if a user has a deployment ExampleDeployment with the following decorator:

@serve.deployment(
    num_replicas=2,
    max_concurrent_queries=15,
)
class ExampleDeployment:
    ...

and the following config file:

...

deployments:

    - name: ExampleDeployment
      num_replicas: 5

...

Serve will set num_replicas=5, using the config file value, and max_concurrent_queries=15, using the code value (since max_concurrent_queries wasn’t specified in the config file). All other deployment settings use Serve defaults since the user didn’t specify them in the code or the config.

Tip

Remember that ray_actor_options counts as a single setting. The entire ray_actor_options dictionary in the config file overrides the entire ray_actor_options dictionary from the graph code. If there are individual options within ray_actor_options (e.g. runtime_env, num_gpus, memory) that are set in the code but not in the config, Serve still won’t use the code settings if the config has a ray_actor_options dictionary. It will treat these missing options as though the user never set them and will use defaults instead. This dictionary overriding behavior also applies to user_config.

Dynamically adjusting parameters in deployment#

The user_config field can be used to supply structured configuration for your deployment. You can pass arbitrary JSON serializable objects to the YAML configuration. Serve will then apply it to all running and future deployment replicas. The application of user configuration will not restart the replica. This means you can use this field to dynamically:

  • adjust model weights and versions without restarting the cluster.

  • adjust traffic splitting percentage for your model composition graph.

  • configure any feature flag, A/B tests, and hyper-parameters for your deployments.

To enable the user_config feature, you need to implement a reconfigure method that takes a dictionary as its only argument:

@serve.deployment
class Model:
    def reconfigure(self, config: Dict[str, Any]):
        self.threshold = config["threshold"]

If the user_config is set when the deployment is created (e.g. in the decorator or the Serve config file), this reconfigure method is called right after the deployment’s __init__ method, and the user_config is passed in as an argument. You can also trigger the reconfigure method by updating your Serve config file with a new user_config and reapplying it to your Ray cluster.

The corresponding YAML snippet is

...
deployments:
    - name: Model
      user_config:
        threshold: 1.5