Serve Config Files#
This section should help you:
Understand the Serve config file format.
Learn how to deploy and update your applications in production using the Serve config.
Learn how to generate a config file for a list of Serve applications.
The Serve config is the recommended way to deploy and update your applications in production. It allows you to fully configure everything related to Serve, including system-level components like the proxy and application-level options like individual deployment parameters (recall how to configure Serve deployments). One major benefit is you can dynamically update individual deployment parameters by modifying the Serve config, without needing to redeploy or restart your application.
Tip
If you are deploying Serve on a VM, you can use the Serve config with the serve deploy CLI command. If you are deploying Serve on Kubernetes, you can embed the Serve config in a RayService custom resource in Kubernetes to
The Serve config is a YAML file with the following format:
proxy_location: ...
http_options:
host: ...
port: ...
request_timeout_s: ...
keep_alive_timeout_s: ...
grpc_options:
port: ...
grpc_servicer_functions: ...
logging_config:
log_level: ...
logs_dir: ...
encoding: ...
enable_access_log: ...
applications:
- name: ...
route_prefix: ...
import_path: ...
runtime_env: ...
deployments:
- name: ...
num_replicas: ...
...
- name:
...
The file contains proxy_location
, http_options
, grpc_options
, logging_config
and applications
.
The proxy_location
field configures where to run proxies to handle traffic to the cluster. You can set proxy_location
to the following values:
EveryNode (default): Run a proxy on every node in the cluster that has at least one replica actor.
HeadOnly: Only run a single proxy on the head node.
Disabled: Don’t run proxies at all. Set this value if you are only making calls to your applications using deployment handles.
The http_options
are as follows. Note that the HTTP config is global to your Ray cluster, and you can’t update it during runtime.
host
: The host IP address for Serve’s HTTP proxies. This is optional and can be omitted. By default, thehost
is set to0.0.0.0
to expose your deployments publicly. If you’re using Kubernetes, you must sethost
to0.0.0.0
to expose your deployments outside the cluster.port
: The port for Serve’s HTTP proxies. This parameter is optional and can be omitted. By default, the port is set to8000
.request_timeout_s
: Allows you to set the end-to-end timeout for a request before terminating and retrying at another replica. By default, the Serve HTTP proxy retries up to10
times when a response is not received due to failures (for example, network disconnect, request timeout, etc.) By default, there is no request timeout.keep_alive_timeout_s
: Allows you to set the keep alive timeout for the HTTP proxy. For more details, see here
The grpc_options
are as follows. Note that the gRPC config is global to your Ray cluster, and you can’t update it during runtime.
port
: The port that the gRPC proxies listen on. These are optional settings and can be omitted. By default, the port is set to9000
.grpc_servicer_functions
: List of import paths for gRPCadd_servicer_to_server
functions to add to Serve’s gRPC proxy. The servicer functions need to be importable from the context of where Serve is running. This defaults to an empty list, which means the gRPC server isn’t started.
The logging_config
is global config, you can configure controller & proxy & replica logs. Note that you can also set application and deployment level logging config, which will take precedence over the global config. See logging config API here for more details.
These are the fields per application:
name
: The names for each application that are auto-generated byserve build
. The name of each application must be unique.route_prefix
: An application can be called via HTTP at the specified route prefix. It defaults to/
. The route prefix for each application must be unique.import_path
: The path to your top-level Serve deployment (or the same path passed toserve run
). The most minimal config file consists of only animport_path
.runtime_env
: Defines the environment that the application runs in. Use this parameter to package application dependencies such aspip
packages (see Runtime Environments for supported fields). Theimport_path
must be available within theruntime_env
if it’s specified. The Serve config’sruntime_env
can only use remote URIs in itsworking_dir
andpy_modules
; it can’t use local zip files or directories. More details on runtime env.deployments (optional)
: A list of deployment options that allows you to override the@serve.deployment
settings specified in the deployment graph code. Each entry in this list must include the deploymentname
, which must match one in the code. If this section is omitted, Serve launches all deployments in the graph with the parameters specified in the code. See how to configure serve deployment options.args
: Arguments that are passed to the application builder.
Below is a config for the Text ML Model
example that follows the format explained above:
proxy_location: EveryNode
http_options:
host: 0.0.0.0
port: 8000
applications:
- name: default
route_prefix: /
import_path: text_ml:app
runtime_env:
pip:
- torch
- transformers
deployments:
- name: Translator
num_replicas: 1
user_config:
language: french
- name: Summarizer
num_replicas: 1
The file uses the same text_ml:app
import path that was used with serve run
, and has two entries in the deployments
list for the translation and summarization deployments. Both entries contain a name
setting and some other configuration options such as num_replicas
.
Tip
Each individual entry in the deployments
list is optional. In the example config file above, you could omit the Summarizer
, including its name
and num_replicas
, and the file would still be valid. When you deploy the file, the Summarizer
deployment is still deployed, using the configurations set in the @serve.deployment
decorator from the application’s code.
Auto-generate the Serve config using serve build
#
You can use a utility to auto-generate this config file from the code. The serve build
command takes an import path to your application, and it generates a config file containing all the deployments and their parameters in the application code. Tweak these parameters to manage your deployments in production.
$ ls
text_ml.py
$ serve build text_ml:app -o serve_config.yaml
$ ls
text_ml.py
serve_config.yaml
The serve_config.yaml
file contains:
proxy_location: EveryNode
http_options:
host: 0.0.0.0
port: 8000
grpc_options:
port: 9000
grpc_servicer_functions: []
logging_config:
encoding: TEXT
log_level: INFO
logs_dir: null
enable_access_log: true
applications:
- name: default
route_prefix: /
import_path: text_ml:app
runtime_env: {}
deployments:
- name: Translator
num_replicas: 1
user_config:
language: french
- name: Summarizer
Note that the runtime_env
field will always be empty when using serve build
and must be set manually. In this case, if torch
and transformers
are not installed globally, you should include these two pip packages in the runtime_env
.
Additionally, serve build
includes the default HTTP and gPRC options in its
autogenerated files. You can modify these parameters.
Dynamically change parameters without restarting replicas (user_config
)#
You can use the user_config
field to supply a structured configuration for your deployment. You can pass arbitrary JSON serializable objects to the YAML configuration. Serve then applies it to all running and future deployment replicas. The application of user configuration doesn’t restart the replica. This deployment continuity means that you can use this field to dynamically:
adjust model weights and versions without restarting the cluster.
adjust traffic splitting percentage for your model composition graph.
configure any feature flag, A/B tests, and hyper-parameters for your deployments.
To enable the user_config
feature, implement a reconfigure
method that takes a JSON-serializable object (e.g., a Dictionary, List, or String) as its only argument:
@serve.deployment
class Model:
def reconfigure(self, config: Dict[str, Any]):
self.threshold = config["threshold"]
If you set the user_config
when you create the deployment (that is, in the decorator or the Serve config file), Ray Serve calls this reconfigure
method right after the deployment’s __init__
method, and passes the user_config
in as an argument. You can also trigger the reconfigure
method by updating your Serve config file with a new user_config
and reapplying it to the Ray cluster. See In-place Updates for more information.
The corresponding YAML snippet is:
...
deployments:
- name: Model
user_config:
threshold: 1.5