Integration with Existing Web Servers¶
In this guide, you will learn how to use Ray Serve to scale up your existing web application. The key feature of Ray Serve that makes this possible is the Python-native ServeHandle API, which allows you keep using your same Python web server while offloading your heavy computation to Ray Serve.
We give two examples, one using a FastAPI web server and another using an AIOHTTP web server, but the same approach will work with any Python web server.
Scaling Up a FastAPI Application¶
Ray Serve has a native integration with FastAPI - please see FastAPI HTTP Deployments.
Scaling Up an AIOHTTP Application¶
In this section, we’ll integrate Ray Serve with an AIOHTTP web server run using Gunicorn. You’ll need to install AIOHTTP and gunicorn with the command pip install aiohttp gunicorn
.
First, here is the script that deploys Ray Serve:
# File name: aiohttp_deploy_serve.py
import ray
from ray import serve
# Connect to the running Ray cluster.
ray.init(address="auto")
# Start a detached Ray Serve instance. It will persist after the script exits.
serve.start(http_options={"http_host": None}, detached=True)
# Set up a deployment with the desired number of replicas. This could also be
# a stateful class (e.g., if we had an expensive model to set up).
@serve.deployment(name="my_model", num_replicas=2)
async def my_model(request):
data = await request.body()
return f"Model received data: {data}"
my_model.deploy()
Next is the script that defines the AIOHTTP server:
# File name: aiohttp_app.py
from aiohttp import web
import ray
from ray import serve
# Connect to the running Ray cluster.
ray.init(address="auto")
# Fetch the ServeHandle to query our model.
my_handle = serve.get_deployment("my_model").get_handle()
# Define our AIOHTTP request handler.
async def handle_request(request):
# Offload the computation to our Ray Serve deployment.
result = await my_handle.remote("dummy input")
return web.Response(text=result)
# Set up an HTTP endpoint.
app = web.Application()
app.add_routes([web.get("/dummy-model", handle_request)])
if __name__ == "__main__":
web.run_app(app)
Here’s how to run this example:
Run
ray start --head
to start a local Ray cluster in the background.In the directory where the example files are saved, run
python aiohttp_deploy_serve.py
to deploy our Ray Serve deployment.Run
gunicorn aiohttp_app:app --worker-class aiohttp.GunicornWebWorker
to start the AIOHTTP app using gunicorn.To test out the server, run
curl localhost:8000/dummy-model
. This should outputModel received data: dummy input
.For cleanup, you can press Ctrl-C to stop the Gunicorn server, and run
ray stop
to stop the background Ray cluster.