Development Workflow#
This page describes the recommended workflow for developing Ray Serve applications. If you’re ready to go to production, jump to the Production Guide section.
Local Development using serve.run
#
You can use serve.run
in a Python script to run and test your application locally, using a handle to send requests programmatically rather than over HTTP.
Benefits:
Self-contained Python is convenient for writing local integration tests.
No need to deploy to a cloud provider or manage infrastructure.
Drawbacks:
Doesn’t test HTTP endpoints.
Can’t use GPUs if your local machine doesn’t have them.
Let’s see a simple example.
# Filename: local_dev.py
from starlette.requests import Request
from ray import serve
from ray.serve.handle import DeploymentHandle, DeploymentResponse
@serve.deployment
class Doubler:
def double(self, s: str):
return s + " " + s
@serve.deployment
class HelloDeployment:
def __init__(self, doubler: DeploymentHandle):
self.doubler = doubler
async def say_hello_twice(self, name: str):
return await self.doubler.double.remote(f"Hello, {name}!")
async def __call__(self, request: Request):
return await self.say_hello_twice(request.query_params["name"])
app = HelloDeployment.bind(Doubler.bind())
We can add the code below to deploy and test Serve locally.
handle: DeploymentHandle = serve.run(app)
response: DeploymentResponse = handle.say_hello_twice.remote(name="Ray")
assert response.result() == "Hello, Ray! Hello, Ray!"
Local Development with HTTP requests#
You can use the serve run
CLI command to run and test your application locally using HTTP to send requests (similar to how you might use the uvicorn
command if you’re familiar with Uvicorn).
Recall our example above:
# Filename: local_dev.py
from starlette.requests import Request
from ray import serve
from ray.serve.handle import DeploymentHandle, DeploymentResponse
@serve.deployment
class Doubler:
def double(self, s: str):
return s + " " + s
@serve.deployment
class HelloDeployment:
def __init__(self, doubler: DeploymentHandle):
self.doubler = doubler
async def say_hello_twice(self, name: str):
return await self.doubler.double.remote(f"Hello, {name}!")
async def __call__(self, request: Request):
return await self.say_hello_twice(request.query_params["name"])
app = HelloDeployment.bind(Doubler.bind())
Now run the following command in your terminal:
serve run local_dev:app
# 2022-08-11 11:31:47,692 INFO scripts.py:294 -- Deploying from import path: "local_dev:app".
# 2022-08-11 11:31:50,372 INFO worker.py:1481 -- Started a local Ray instance. View the dashboard at http://127.0.0.1:8265.
# (ServeController pid=9865) INFO 2022-08-11 11:31:54,039 controller 9865 proxy_state.py:129 - Starting HTTP proxy with name 'SERVE_CONTROLLER_ACTOR:SERVE_PROXY_ACTOR-dff7dc5b97b4a11facaed746f02448224aa0c1fb651988ba7197e949' on node 'dff7dc5b97b4a11facaed746f02448224aa0c1fb651988ba7197e949' listening on '127.0.0.1:8000'
# (ServeController pid=9865) INFO 2022-08-11 11:31:55,373 controller 9865 deployment_state.py:1232 - Adding 1 replicas to deployment 'Doubler'.
# (ServeController pid=9865) INFO 2022-08-11 11:31:55,389 controller 9865 deployment_state.py:1232 - Adding 1 replicas to deployment 'HelloDeployment'.
# (HTTPProxyActor pid=9872) INFO: Started server process [9872]
# 2022-08-11 11:31:57,383 SUCC scripts.py:315 -- Deployed successfully.
The serve run
command blocks the terminal and can be canceled with Ctrl-C. Typically, serve run
should not be run simultaneously from multiple terminals, unless each serve run
is targeting a separate running Ray cluster.
Now that Serve is running, we can send HTTP requests to the application.
For simplicity, we’ll just use the curl
command to send requests from another terminal.
curl -X PUT "http://localhost:8000/?name=Ray"
# Hello, Ray! Hello, Ray!
After you’re done testing, you can shut down Ray Serve by interrupting the serve run
command (e.g., with Ctrl-C):
^C2022-08-11 11:47:19,829 INFO scripts.py:323 -- Got KeyboardInterrupt, shutting down...
(ServeController pid=9865) INFO 2022-08-11 11:47:19,926 controller 9865 deployment_state.py:1257 - Removing 1 replicas from deployment 'Doubler'.
(ServeController pid=9865) INFO 2022-08-11 11:47:19,929 controller 9865 deployment_state.py:1257 - Removing 1 replicas from deployment 'HelloDeployment'.
Note that rerunning serve run
will redeploy all deployments. To prevent redeploying those deployments whose code hasn’t changed, you can use serve deploy
; see the Production Guide for details.
Testing on a remote cluster#
To test on a remote cluster, you’ll use serve run
again, but this time you’ll pass in an --address
argument to specify the address of the Ray cluster to connect to. For remote clusters, this address has the form ray://<head-node-ip-address>:10001
; see Ray Client for more information.
When making the transition from your local machine to a remote cluster, you’ll need to make sure your cluster has a similar environment to your local machine–files, environment variables, and Python packages, for example.
Let’s see a simple example that just packages the code. Run the following command on your local machine, with your remote cluster head node IP address substituted for <head-node-ip-address>
in the command:
serve run --address=ray://<head-node-ip-address>:10001 --working-dir="./project/src" local_dev:app
This connects to the remote cluster with the Ray Client, uploads the working_dir
directory, and runs your Serve application. Here, the local directory specified by working_dir
must contain local_dev.py
so that it can be uploaded to the cluster and imported by Ray Serve.
Once this is up and running, we can send requests to the application:
curl -X PUT http://<head-node-ip-address>:8000/?name=Ray
# Hello, Ray! Hello, Ray!
For more complex dependencies, including files outside the working directory, environment variables, and Python packages, you can use Runtime Environments. This example uses the –runtime-env-json argument:
serve run --address=ray://<head-node-ip-address>:10001 --runtime-env-json='{"env_vars": {"MY_ENV_VAR": "my-value"}, "working_dir": "./project/src", "pip": ["requests", "chess"]}' local_dev:app
You can also specify the runtime_env
in a YAML file; see serve run for details.
What’s Next?#
View details about your Serve application in the Ray dashboard. Once you are ready to deploy to production, see the Production Guide.