Serving an Object Detection Model
Serving an Object Detection Model#
This example runs an object detection application with Ray Serve.
To run this example, install the following:
pip install "ray[serve]" requests torch
This example uses the ultralytics/yolov5 model and FastAPI. Save the following code to a file named object_detection.py.
Use the following Serve code:
import torch
from PIL import Image
import numpy as np
from io import BytesIO
from fastapi.responses import Response
from fastapi import FastAPI
from ray import serve
from ray.serve.handle import DeploymentHandle
app = FastAPI()
@serve.deployment(num_replicas=1)
@serve.ingress(app)
class APIIngress:
def __init__(self, object_detection_handle) -> None:
self.handle: DeploymentHandle = object_detection_handle.options(
use_new_handle_api=True,
)
@app.get(
"/detect",
responses={200: {"content": {"image/jpeg": {}}}},
response_class=Response,
)
async def detect(self, image_url: str):
image = await self.handle.detect.remote(image_url)
file_stream = BytesIO()
image.save(file_stream, "jpeg")
return Response(content=file_stream.getvalue(), media_type="image/jpeg")
@serve.deployment(
ray_actor_options={"num_gpus": 1},
autoscaling_config={"min_replicas": 1, "max_replicas": 2},
)
class ObjectDetection:
def __init__(self):
self.model = torch.hub.load("ultralytics/yolov5", "yolov5s")
self.model.cuda()
def detect(self, image_url: str):
result_im = self.model(image_url)
return Image.fromarray(result_im.render()[0].astype(np.uint8))
entrypoint = APIIngress.bind(ObjectDetection.bind())
Use serve run object_detection:entrypoint
to start the serve application.
Note
The autoscaling config sets min_replicas
to 0, which means the deployment starts with no ObjectDetection
replicas. These replicas spawn only when a request arrives. After a period where no requests arrive, Serve downscales ObjectDetection
back to 0 replicas to save GPU resources.
You should see the following logs:
(ServeReplica:ObjectDection pid=4747) warnings.warn(
(ServeReplica:ObjectDection pid=4747) Downloading: "https://github.com/ultralytics/yolov5/zipball/master" to /home/ray/.cache/torch/hub/master.zip
(ServeReplica:ObjectDection pid=4747) YOLOv5 🚀 2023-3-8 Python-3.9.16 torch-1.13.0+cu116 CUDA:0 (Tesla T4, 15110MiB)
(ServeReplica:ObjectDection pid=4747)
(ServeReplica:ObjectDection pid=4747) Fusing layers...
(ServeReplica:ObjectDection pid=4747) YOLOv5s summary: 213 layers, 7225885 parameters, 0 gradients
(ServeReplica:ObjectDection pid=4747) Adding AutoShape...
2023-03-08 21:10:21,685 SUCC <string>:93 -- Deployed Serve app successfully.
Use the following code to send requests:
import requests
image_url = "https://ultralytics.com/images/zidane.jpg"
resp = requests.get(f"http://127.0.0.1:8000/detect?image_url={image_url}")
with open("output.jpeg", 'wb') as f:
f.write(resp.content)
The output.png file is saved locally. Check it out!