Development Workflow#

This page describes the recommended workflow for developing Ray Serve applications. If you’re ready to go to production, jump to the Production Guide section.

Local Development using `serve.run`#

You can use serve.run in a Python script to run and test your application locally, using a handle to send requests programmatically rather than over HTTP.

Benefits:

Self-contained Python is convenient for writing local integration tests.
No need to deploy to a cloud provider or manage infrastructure.

Drawbacks:

Doesn’t test HTTP endpoints.
Can’t use GPUs if your local machine doesn’t have them.

Let’s see a simple example.

# Filename: local_dev.py
from starlette.requests import Request

from ray import serve
from ray.serve.handle import DeploymentHandle, DeploymentResponse


@serve.deployment
class Doubler:
    def double(self, s: str):
        return s + " " + s


@serve.deployment
class HelloDeployment:
    def __init__(self, doubler: DeploymentHandle):
        self.doubler = doubler

    async def say_hello_twice(self, name: str):
        return await self.doubler.double.remote(f"Hello, {name}!")

    async def __call__(self, request: Request):
        return await self.say_hello_twice(request.query_params["name"])


app = HelloDeployment.bind(Doubler.bind())

We can add the code below to deploy and test Serve locally.

handle: DeploymentHandle = serve.run(app)
response: DeploymentResponse = handle.say_hello_twice.remote(name="Ray")
assert response.result() == "Hello, Ray! Hello, Ray!"

Local Development with HTTP requests#

You can use the serve run CLI command to run and test your application locally using HTTP to send requests (similar to how you might use the uvicorn command if you’re familiar with Uvicorn).

Recall our example above:

# Filename: local_dev.py
from starlette.requests import Request

from ray import serve
from ray.serve.handle import DeploymentHandle, DeploymentResponse


@serve.deployment
class Doubler:
    def double(self, s: str):
        return s + " " + s


@serve.deployment
class HelloDeployment:
    def __init__(self, doubler: DeploymentHandle):
        self.doubler = doubler

    async def say_hello_twice(self, name: str):
        return await self.doubler.double.remote(f"Hello, {name}!")

    async def __call__(self, request: Request):
        return await self.say_hello_twice(request.query_params["name"])


app = HelloDeployment.bind(Doubler.bind())

Now run the following command in your terminal:

serve run local_dev:app
# 2022-08-11 11:31:47,692 INFO scripts.py:294 -- Deploying from import path: "local_dev:app".
# 2022-08-11 11:31:50,372 INFO worker.py:1481 -- Started a local Ray instance. View the dashboard at http://127.0.0.1:8265.
# (ServeController pid=9865) INFO 2022-08-11 11:31:54,039 controller 9865 proxy_state.py:129 - Starting HTTP proxy with name 'SERVE_CONTROLLER_ACTOR:SERVE_PROXY_ACTOR-dff7dc5b97b4a11facaed746f02448224aa0c1fb651988ba7197e949' on node 'dff7dc5b97b4a11facaed746f02448224aa0c1fb651988ba7197e949' listening on '127.0.0.1:8000'
# (ServeController pid=9865) INFO 2022-08-11 11:31:55,373 controller 9865 deployment_state.py:1232 - Adding 1 replicas to deployment 'Doubler'.
# (ServeController pid=9865) INFO 2022-08-11 11:31:55,389 controller 9865 deployment_state.py:1232 - Adding 1 replicas to deployment 'HelloDeployment'.
# (HTTPProxyActor pid=9872) INFO:     Started server process [9872]
# 2022-08-11 11:31:57,383 SUCC scripts.py:315 -- Deployed successfully.

The serve run command blocks the terminal and can be canceled with Ctrl-C. Typically, serve run should not be run simultaneously from multiple terminals, unless each serve run is targeting a separate running Ray cluster.

Now that Serve is running, we can send HTTP requests to the application. For simplicity, we’ll just use the curl command to send requests from another terminal.

curl -X PUT "http://localhost:8000/?name=Ray"
# Hello, Ray! Hello, Ray!

After you’re done testing, you can shut down Ray Serve by interrupting the serve run command (e.g., with Ctrl-C):

^C2022-08-11 11:47:19,829       INFO scripts.py:323 -- Got KeyboardInterrupt, shutting down...
(ServeController pid=9865) INFO 2022-08-11 11:47:19,926 controller 9865 deployment_state.py:1257 - Removing 1 replicas from deployment 'Doubler'.
(ServeController pid=9865) INFO 2022-08-11 11:47:19,929 controller 9865 deployment_state.py:1257 - Removing 1 replicas from deployment 'HelloDeployment'.

Note that rerunning serve run redeploys all deployments. To prevent redeploying the deployments whose code hasn’t changed, you can use serve deploy; see the Production Guide for details.

Local Testing Mode#

Note

This is an experimental feature.

Ray Serve supports a local testing mode that allows you to run your deployments locally in a single process. This mode is useful for unit testing and debugging your application logic without the overhead of a full Ray cluster. To enable this mode, use the _local_testing_mode flag in the serve.run function:

serve.run(app, _local_testing_mode=True)

This mode runs each deployment in a background thread and supports most of the same features as running on a full Ray cluster. Note that some features, such as converting DeploymentResponses to ObjectRefs, are not supported in local testing mode. If you encounter limitations, consider filing a feature request on GitHub.

Testing on a remote cluster#

To test on a remote cluster, use serve run again, but this time, pass in an --address argument to specify the address of the Ray cluster to connect to. For remote clusters, this address has the form ray://<head-node-ip-address>:10001; see Ray Client for more information.

When making the transition from your local machine to a remote cluster, you’ll need to make sure your cluster has a similar environment to your local machine–files, environment variables, and Python packages, for example.

Let’s see a simple example that just packages the code. Run the following command on your local machine, with your remote cluster head node IP address substituted for <head-node-ip-address> in the command:

serve run  --address=ray://<head-node-ip-address>:10001 --working-dir="./project/src" local_dev:app

This connects to the remote cluster with the Ray Client, uploads the working_dir directory, and runs your Serve application. Here, the local directory specified by working_dir must contain local_dev.py so that it can be uploaded to the cluster and imported by Ray Serve.

Once this is up and running, we can send requests to the application:

curl -X PUT http://<head-node-ip-address>:8000/?name=Ray
# Hello, Ray! Hello, Ray!

For more complex dependencies, including files outside the working directory, environment variables, and Python packages, you can use Runtime Environments. This example uses the –runtime-env-json argument:

serve run  --address=ray://<head-node-ip-address>:10001 --runtime-env-json='{"env_vars": {"MY_ENV_VAR": "my-value"}, "working_dir": "./project/src", "pip": ["requests", "chess"]}' local_dev:app

You can also specify the runtime_env in a YAML file; see serve run for details.

What’s Next?#

View details about your Serve application in the Ray dashboard. Once you are ready to deploy to production, see the Production Guide.

Development Workflow#

Local Development using serve.run#

Local Development with HTTP requests#

Local Testing Mode#

Testing on a remote cluster#

What’s Next?#

Local Development using `serve.run`#