Serve an Object Detection Model#

This example runs an object detection application with Ray Serve.

To run this example, install the following:

pip install "ray[serve]" requests torch

This example uses the ultralytics/yolov5 model and FastAPI. Save the following code to a file named object_detection.py.

Use the following Serve code:

import torch
from PIL import Image
import numpy as np
from io import BytesIO
from fastapi.responses import Response
from fastapi import FastAPI

from ray import serve
from ray.serve.handle import DeploymentHandle


app = FastAPI()


@serve.deployment(num_replicas=1)
@serve.ingress(app)
class APIIngress:
    def __init__(self, object_detection_handle: DeploymentHandle):
        self.handle = object_detection_handle

    @app.get(
        "/detect",
        responses={200: {"content": {"image/jpeg": {}}}},
        response_class=Response,
    )
    async def detect(self, image_url: str):
        image = await self.handle.detect.remote(image_url)
        file_stream = BytesIO()
        image.save(file_stream, "jpeg")
        return Response(content=file_stream.getvalue(), media_type="image/jpeg")


@serve.deployment(
    ray_actor_options={"num_gpus": 1},
    autoscaling_config={"min_replicas": 1, "max_replicas": 2},
)
class ObjectDetection:
    def __init__(self):
        self.model = torch.hub.load("ultralytics/yolov5", "yolov5s")
        self.model.cuda()
        self.model.to(torch.device(0))

    def detect(self, image_url: str):
        result_im = self.model(image_url)
        return Image.fromarray(result_im.render()[0].astype(np.uint8))


entrypoint = APIIngress.bind(ObjectDetection.bind())

Use serve run object_detection:entrypoint to start the Serve application.

Note

The autoscaling config sets min_replicas to 0, which means the deployment starts with no ObjectDetection replicas. These replicas spawn only when a request arrives. After a period time when no requests arrive, Serve downscales ObjectDetection back to 0 replicas to save GPU resources.

You should see the following log messages:

(ServeReplica:ObjectDection pid=4747)   warnings.warn(
(ServeReplica:ObjectDection pid=4747) Downloading: "https://github.com/ultralytics/yolov5/zipball/master" to /home/ray/.cache/torch/hub/master.zip
(ServeReplica:ObjectDection pid=4747) YOLOv5 🚀 2023-3-8 Python-3.9.16 torch-1.13.0+cu116 CUDA:0 (Tesla T4, 15110MiB)
(ServeReplica:ObjectDection pid=4747) 
(ServeReplica:ObjectDection pid=4747) Fusing layers... 
(ServeReplica:ObjectDection pid=4747) YOLOv5s summary: 213 layers, 7225885 parameters, 0 gradients
(ServeReplica:ObjectDection pid=4747) Adding AutoShape... 
2023-03-08 21:10:21,685 SUCC <string>:93 -- Deployed Serve app successfully.

Tip

While running, the Serve app may raise an error similar to the following:

ImportError: libGL.so.1: cannot open shared object file: No such file or directory

This error usually occurs when running opencv-python, an image recognition library used in this example, on a headless environment, such as a container. This environment may lack dependencies that opencv-python needs. opencv-python-headless has fewer external dependencies and is suitable for headless environments.

In your Ray cluster, try running the following command:

pip uninstall opencv-python; pip install opencv-python-headless

Use the following code to send requests:

import requests

image_url = "https://ultralytics.com/images/zidane.jpg"
resp = requests.get(f"http://127.0.0.1:8000/detect?image_url={image_url}")

with open("output.jpeg", 'wb') as f:
    f.write(resp.content)

The app saves the output.png file locally. The following is an example of an output image.