Deploy on VM#

You can deploy your Serve application to production on a Ray cluster using the Ray Serve CLI. serve deploy takes in a config file path and it deploys that file to a Ray cluster over HTTP. This could either be a local, single-node cluster as in this example or a remote, multi-node cluster started with the Ray Cluster Launcher.

This section should help you:

understand how to deploy a Ray Serve config file using the CLI.
understand how to update your application using the CLI.
understand how to deploy to a remote cluster started with the Ray Cluster Launcher.

Start by deploying this config for the Text ML Application example:

$ ls
text_ml.py
serve_config.yaml

$ ray start --head
...

$ serve deploy serve_config.yaml
2022-06-20 17:26:31,106	SUCC scripts.py:139 --
Sent deploy request successfully!
 * Use `serve status` to check deployments' statuses.
 * Use `serve config` to see the running app's config.

ray start --head starts a long-lived Ray cluster locally. serve deploy serve_config.yaml deploys the serve_config.yaml file to this local cluster. To stop Ray cluster, run the CLI command ray stop.

The message Sent deploy request successfully! means:

The Ray cluster has received your config file successfully.
It will start a new Serve application if one hasn’t already started.
The Serve application will deploy the deployments from your deployment graph, updated with the configurations from your config file.

It does not mean that your Serve application, including your deployments, has already started running successfully. This happens asynchronously as the Ray cluster attempts to update itself to match the settings from your config file. See Inspect an application for how to get the current status.

Using a remote cluster#

By default, serve deploy deploys to a cluster running locally. However, you should also use serve deploy whenever you want to deploy your Serve application to a remote cluster. serve deploy takes in an optional --address/-a argument where you can specify your remote Ray cluster’s dashboard address. This address should be of the form:

[RAY_CLUSTER_URI]:[DASHBOARD_PORT]

As an example, the address for the local cluster started by ray start --head is http://127.0.0.1:8265. We can explicitly deploy to this address using the command

$ serve deploy config_file.yaml -a http://127.0.0.1:8265

The Ray Dashboard’s default port is 8265. To set it to a different value, use the --dashboard-port argument when running ray start.

Note

When running on a remote cluster, you need to ensure that the import path is accessible. See Handle Dependencies for how to add a runtime environment.

Tip

By default, all the Serve CLI commands assume that you’re working with a local cluster. All Serve CLI commands, except serve start and serve run use the Ray Dashboard address associated with a local cluster started by ray start --head. However, if the RAY_DASHBOARD_ADDRESS environment variable is set, these Serve CLI commands will default to that value instead.

Similarly, serve start and serve run, use the Ray head node address associated with a local cluster by default. If the RAY_ADDRESS environment variable is set, they will use that value instead.

You can check RAY_DASHBOARD_ADDRESS’s value by running:

$ echo $RAY_DASHBOARD_ADDRESS

You can set this variable by running the CLI command:

$ export RAY_DASHBOARD_ADDRESS=[YOUR VALUE]

You can unset this variable by running the CLI command:

$ unset RAY_DASHBOARD_ADDRESS

Check for this variable in your environment to make sure you’re using your desired Ray Dashboard address.

To inspect the status of the Serve application in production, see Inspect an application.

Make heavyweight code updates (like runtime_env changes) by starting a new Ray Cluster, updating your Serve config file, and deploying the file with serve deploy to the new cluster. Once the new deployment is finished, switch your traffic to the new cluster.