Serving an Object Detection Model#

This example runs an object detection application with Ray Serve.

To run this example, install the following:

pip install "ray[serve]" requests torch

This example uses the ultralytics/yolov5 model and FastAPI. Save the following code to a file named

Use the following Serve code:

import torch
from PIL import Image
import numpy as np
from io import BytesIO
from fastapi.responses import Response
from fastapi import FastAPI

from ray import serve
from ray.serve.handle import DeploymentHandle

app = FastAPI()

class APIIngress:
    def __init__(self, object_detection_handle) -> None:
        self.handle: DeploymentHandle = object_detection_handle.options(

        responses={200: {"content": {"image/jpeg": {}}}},
    async def detect(self, image_url: str):
        image = await self.handle.detect.remote(image_url)
        file_stream = BytesIO(), "jpeg")
        return Response(content=file_stream.getvalue(), media_type="image/jpeg")

    ray_actor_options={"num_gpus": 1},
    autoscaling_config={"min_replicas": 1, "max_replicas": 2},
class ObjectDetection:
    def __init__(self):
        self.model = torch.hub.load("ultralytics/yolov5", "yolov5s")

    def detect(self, image_url: str):
        result_im = self.model(image_url)
        return Image.fromarray(result_im.render()[0].astype(np.uint8))

entrypoint = APIIngress.bind(ObjectDetection.bind())

Use serve run object_detection:entrypoint to start the serve application.


The autoscaling config sets min_replicas to 0, which means the deployment starts with no ObjectDetection replicas. These replicas spawn only when a request arrives. After a period where no requests arrive, Serve downscales ObjectDetection back to 0 replicas to save GPU resources.

You should see the following logs:

(ServeReplica:ObjectDection pid=4747)   warnings.warn(
(ServeReplica:ObjectDection pid=4747) Downloading: "" to /home/ray/.cache/torch/hub/
(ServeReplica:ObjectDection pid=4747) YOLOv5 🚀 2023-3-8 Python-3.9.16 torch-1.13.0+cu116 CUDA:0 (Tesla T4, 15110MiB)
(ServeReplica:ObjectDection pid=4747) 
(ServeReplica:ObjectDection pid=4747) Fusing layers... 
(ServeReplica:ObjectDection pid=4747) YOLOv5s summary: 213 layers, 7225885 parameters, 0 gradients
(ServeReplica:ObjectDection pid=4747) Adding AutoShape... 
2023-03-08 21:10:21,685 SUCC <string>:93 -- Deployed Serve app successfully.

Use the following code to send requests:

import requests

image_url = ""
resp = requests.get(f"{image_url}")

with open("output.jpeg", 'wb') as f:

The output.png file is saved locally. Check it out! image