Serve Config Files (serve build)
Contents
Serve Config Files (serve build
)#
This section should help you:
understand the Serve config file format.
understand how to generate and update a config file for a Serve application.
This config file can be used with the serve deploy command CLI or embedded in a RayService custom resource in Kubernetes to deploy and update your application in production. The file is written in YAML and has the following format:
import_path: ...
runtime_env: ...
host: ...
port: ...
deployments:
- name: ...
num_replicas: ...
...
- name:
...
...
The file contains the following fields:
An
import_path
, which is the path to your top-level Serve deployment (or the same path passed toserve run
). The most minimal config file consists of only animport_path
.A
runtime_env
that defines the environment that the application will run in. This is used to package application dependencies such aspip
packages (see Runtime Environments for supported fields). Note that theimport_path
must be available within theruntime_env
if it’s specified.host
andport
are HTTP options that determine the host IP address and the port for your Serve application’s HTTP proxies. These are optional settings and can be omitted. By default, thehost
will be set to0.0.0.0
to expose your deployments publicly, and the port will be set to8000
. If you’re using Kubernetes, settinghost
to0.0.0.0
is necessary to expose your deployments outside the cluster.A list of
deployments
. This is optional and allows you to override the@serve.deployment
settings specified in the deployment graph code. Each entry in this list must include the deploymentname
, which must match one in the code. If this section is omitted, Serve launches all deployments in the graph with the settings specified in the code.
Below is an equivalent config for the FruitStand
example:
import_path: fruit:deployment_graph
runtime_env: {}
deployments:
- name: FruitMarket
num_replicas: 2
- name: MangoStand
user_config:
price: 3
- name: OrangeStand
user_config:
price: 2
- name: PearStand
user_config:
price: 4
- name: DAGDriver
The file uses the same fruit:deployment_graph
import path that was used with serve run
and it has five entries in the deployments
list– one for each deployment. All the entries contain a name
setting and some other configuration options such as num_replicas
or user_config
.
Tip
Each individual entry in the deployments
list is optional. In the example config file above, we could omit the PearStand
, including its name
and user_config
, and the file would still be valid. When we deploy the file, the PearStand
deployment will still be deployed, using the configurations set in the @serve.deployment
decorator from the deployment graph’s code.
We can also auto-generate this config file from the code. The serve build
command takes an import path to your deployment graph and it creates a config file containing all the deployments and their settings from the graph. You can tweak these settings to manage your deployments in production.
Using the FruitStand
deployment graph example:
$ ls
fruit.py
$ serve build fruit:deployment_graph -o fruit_config.yaml
$ ls
fruit.py
fruit_config.yaml
The fruit_config.yaml
file contains:
import_path: fruit:deployment_graph
runtime_env: {}
host: 0.0.0.0
port: 8000
deployments:
- name: MangoStand
user_config:
price: 3
- name: OrangeStand
user_config:
price: 2
- name: PearStand
user_config:
price: 4
- name: FruitMarket
num_replicas: 2
- name: DAGDriver
route_prefix: /
Note that the runtime_env
field will always be empty when using serve build
and must be set manually.
Additionally, serve build
includes the default host
and port
in its
autogenerated files. You can modify these parameters to select a different host
and port.
Tip
You can use the --kubernetes-format
/-k
flag with serve build
to print the Serve config in a format that can be copy-pasted directly into your Kubernetes config.
Overriding deployment settings#
Settings from @serve.deployment
can be overriden with this Serve config file. The order of priority is (from highest to lowest):
Config File
Deployment graph code (either through the
@serve.deployment
decorator or a.set_options()
call)Serve defaults
For example, if a deployment’s num_replicas
is specified in the config file and their graph code, Serve will use the config file’s value. If it’s only specified in the code, Serve will use the code value. If the user doesn’t specify it anywhere, Serve will use a default (which is num_replicas=1
).
Keep in mind that this override order is applied separately to each individual setting.
For example, if a user has a deployment ExampleDeployment
with the following decorator:
@serve.deployment(
num_replicas=2,
max_concurrent_queries=15,
)
class ExampleDeployment:
...
and the following config file:
...
deployments:
- name: ExampleDeployment
num_replicas: 5
...
Serve will set num_replicas=5
, using the config file value, and max_concurrent_queries=15
, using the code value (since max_concurrent_queries
wasn’t specified in the config file). All other deployment settings use Serve defaults since the user didn’t specify them in the code or the config.
Tip
Remember that ray_actor_options
counts as a single setting. The entire ray_actor_options
dictionary in the config file overrides the entire ray_actor_options
dictionary from the graph code. If there are individual options within ray_actor_options
(e.g. runtime_env
, num_gpus
, memory
) that are set in the code but not in the config, Serve still won’t use the code settings if the config has a ray_actor_options
dictionary. It will treat these missing options as though the user never set them and will use defaults instead. This dictionary overriding behavior also applies to user_config
.
Dynamically adjusting parameters in deployment#
The user_config
field can be used to supply structured configuration for your deployment. You can pass arbitrary JSON serializable objects to the YAML configuration. Serve will then apply it to all running and future deployment replicas. The application of user configuration will not restart the replica. This means you can use this field to dynamically:
adjust model weights and versions without restarting the cluster.
adjust traffic splitting percentage for your model composition graph.
configure any feature flag, A/B tests, and hyper-parameters for your deployments.
To enable the user_config
feature, you need to implement a reconfigure
method that takes a dictionary as its only argument:
@serve.deployment
class Model:
def reconfigure(self, config: Dict[str, Any]):
self.threshold = config["threshold"]
If the user_config
is set when the deployment is created (e.g. in the decorator or the Serve config file), this reconfigure
method is called right after the deployment’s __init__
method, and the user_config
is passed in as an argument. You can also trigger the reconfigure
method by updating your Serve config file with a new user_config
and reapplying it to your Ray cluster.
The corresponding YAML snippet is
...
deployments:
- name: Model
user_config:
threshold: 1.5