Ray Serve: Scalable and Programmable Serving

../_images/logo.svg

Ray Serve is a scalable model-serving library built on Ray.

For users, Ray Serve is:

  • Framework Agnostic:Use the same toolkit to serve everything from deep learning models built with frameworks like PyTorch or Tensorflow & Keras to Scikit-Learn models or arbitrary business logic.

  • Python First: Configure your model serving with pure Python code - no more YAMLs or JSON configs.

As a library, Ray Serve enables:

Since Ray is built on Ray, Ray Serve also allows you to scale to many machines and allows you to leverage all of the other Ray frameworks so you can deploy and scale on any cloud.

Note

If you want to try out Serve, join our community slack and discuss in the #serve channel.

Installation

Ray Serve supports Python versions 3.5 and higher. To install Ray Serve:

pip install "ray[serve]"

Ray Serve in 90 Seconds

Serve a function by defining a function, an endpoint, and a backend (in this case a stateless function) then connecting the two by setting traffic from the endpoint to the backend.

from ray import serve
import requests

serve.init()


def echo(flask_request):
    return "hello " + flask_request.args.get("name", "serve!")


serve.create_backend("hello", echo)
serve.create_endpoint("hello", backend="hello", route="/hello")

requests.get("http://127.0.0.1:8000/hello").text
# > "hello serve!"

Serve a stateful class by defining a class (Counter), creating an endpoint and a backend, then connecting the two by setting traffic from the endpoint to the backend.

from ray import serve
import requests

serve.init()


class Counter:
    def __init__(self):
        self.count = 0

    def __call__(self, flask_request):
        return {"current_counter": self.count}


serve.create_backend("counter", Counter)
serve.create_endpoint("counter", backend="counter", route="/counter")

requests.get("http://127.0.0.1:8000/counter").json()
# > {"current_counter": self.count}

See Key Concepts for more exhaustive coverage about Ray Serve and its core concepts.

Why Ray Serve?

There are generally two ways of serving machine learning applications, both with serious limitations: you can build using a traditional webserver - your own Flask app or you can use a cloud hosted solution.

The first approach is easy to get started with, but it’s hard to scale each component. The second approach requires vendor lock-in (SageMaker), framework specific tooling (TFServing), and a general lack of flexibility.

Ray Serve solves these problems by giving a user the ability to leverage the simplicity of deployment of a simple webserver but handles the complex routing, scaling, and testing logic necessary for production deployments.

For more on the motivation behind Ray Serve, check out these meetup slides.

When should I use Ray Serve?

Ray Serve is a simple (but flexible) tool for deploying, operating, and monitoring Python based machine learning models. Ray Serve excels when scaling out to serve models in production is a necessity. This might be because of large scale batch processing requirements or because you’re going to serve a number of models behind different endpoints and may need to run A/B tests or control traffic between different models.

If you plan on running on multiple machines, Ray Serve will serve you well.

What’s next?

Check out the Key Concepts, learn more about Advanced Topics, Configurations, & FAQ, look at the Ray Serve FAQ, or head over to the Tutorials to get started building your Ray Serve Applications.