End-to-End Tutorial

By the end of this tutorial, you will have learned the basics of Ray Serve and will be ready to pick and choose from the advanced topics in the sidebar.

First, install Ray Serve and all of its dependencies by running the following command in your terminal:

pip install "ray[serve]"

Now we will write a Python script to serve a simple “Counter” class over HTTP. You may open an interactive Python terminal and copy in the lines below as we go.

First, import Ray and Ray Serve:

import ray
from ray import serve

Ray Serve runs on top of a Ray cluster, so the next step is to start a local Ray cluster:



ray.init() will start a single-node Ray cluster on your local machine, which will allow you to use all your CPU cores to serve requests in parallel. To start a multi-node cluster, see Ray Cluster Overview.

Next, start the Ray Serve runtime:



When the Python script exits, Ray Serve will shut down. If you would rather keep Ray Serve running in the background, see Deploying Ray Serve.

Now we will define a simple Counter class. The goal is to serve this class behind an HTTP endpoint using Ray Serve.

For our Counter class to work with Ray Serve, it needs to be a callable class, so we will need to implement a __call__ method as shown:

class Counter:
  def __init__(self):
      self.count = 0

  def __call__(self, request):
      self.count += 1
      return {"count": self.count}


You can also serve other class methods besides __call__.


Besides classes, you can also serve standalone functions with Ray Serve in the same way.

Now we are ready to deploy our class using Ray Serve. First, create a Ray Serve backend and pass in the Counter class:

serve.create_backend("my_backend", Counter)

Here we have assigned the tag "my_backend" to this backend, which we can use to identify this backend in the future.


Ray Serve Backends can be configured to improve performance, for example by increasing the number of replicas of the class being served in parallel. For details, see Configuring a Backend.

To complete the deployment, we will expose this backend over HTTP by creating a Ray Serve endpoint:

serve.create_endpoint("my_endpoint", backend="my_backend", route="/counter")

Here "my_endpoint" is a tag used to identify this endpoint, and we have specified the backend to place behind the endpoint via the backend parameter. The last parameter, route, is the path at which our endpoint will be available over HTTP.

Now that our deployment is up and running, let’s test it out by making a query over HTTP. In your browser, simply visit, and you should see the output {“count”: 1”}. If you keep refreshing the page, the count should increase, as expected.

You just built and ran your first Ray Serve application! Now you can dive into the Core APIs to get a deeper understanding of Ray Serve. For more interesting example applications, including integrations with popular machine learning frameworks and Python web servers, be sure to check out Advanced Tutorials.