Advanced Topics and Configurations¶
Ray Serve has a number of knobs and tools for you to tune for your particular workload. All Ray Serve advanced options and topics are covered on this page aside from the fundamentals of Deploying Ray Serve. For a more hands-on take, please check out the Serve Tutorials.
There are a number of things you’ll likely want to do with your serving application including
scaling out, splitting traffic, or batching input for better performance. To do all of this,
you will create a BackendConfig
, a configuration object that you’ll use to set
the properties of a particular backend.
Sync and Async Handles¶
Ray Serve offers two types of ServeHandle
. You can use the client.get_handle(..., sync=True|False)
flag to toggle between them.
When you set
sync=True
(the default), a synchronous handle is returned. Callinghandle.remote()
should return a Ray ObjectRef.When you set
sync=False
, an asyncio based handle is returned. You need to Call it withawait handle.remote()
to return a Ray ObjectRef. To useawait
, you have to runclient.get_handle
andhandle.remote
in Python asyncio event loop.
The async handle has performance advantage because it uses asyncio directly; as compared to the sync handle, which talks to an asyncio event loop in a thread. To learn more about the reasoning behind these, checkout our architecture documentation.
Configuring HTTP Server Locations¶
By default, Ray Serve starts only one HTTP on the head node of the Ray cluster.
You can configure this behavior using the http_options={"location": ...}
flag
in serve.start
:
“HeadOnly”: start one HTTP server on the head node. Serve assumes the head node is the node you executed serve.start on. This is the default.
“EveryNode”: start one HTTP server per node.
“NoServer” or
None
: disable HTTP server.
Note
Using the “EveryNode” option, you can point a cloud load balancer to the instance group of Ray cluster to achieve high availability of Serve’s HTTP proxies.
Variable HTTP Routes¶
Ray Serve supports capturing path parameters. For example, in a call of the form
client.create_endpoint("my_endpoint", backend="my_backend", route="/api/{username}")
the username
parameter will be accessible in your backend code as follows:
def my_backend(request):
username = request.path_params["username"]
...
Ray Serve uses Starlette’s Router class under the hood for routing, so type conversion for path parameters is also supported, as well as multiple path parameters. For example, suppose this route is used:
client.create_endpoint(
"complex", backend="f", route="/api/{user_id:int}/{number:float}")
Then for a query to the route /api/123/3.14
, the request.path_params
dictionary
available in the backend will be {"user_id": 123, "number": 3.14}
, where 123
is
a Python int and 3.14
is a Python float.
For full details on the supported path parameters, see Starlette’s path parameters documentation.