ray.serve.handle.DeploymentHandle#

class ray.serve.handle.DeploymentHandle[source]#

A handle used to make requests to a deployment at runtime.

This is primarily used to compose multiple deployments within a single application. It can also be used to make calls to the ingress deployment of an application (e.g., for programmatic testing).

Example:

import ray
from ray import serve
from ray.serve.handle import DeploymentHandle, DeploymentResponse

@serve.deployment
class Downstream:
    def say_hi(self, message: str):
        return f"Hello {message}!"
        self._message = message

@serve.deployment
class Ingress:
    def __init__(self, handle: DeploymentHandle):
        self._downstream_handle = handle

    async def __call__(self, name: str) -> str:
        response = self._downstream_handle.say_hi.remote(name)
        return await response

app = Ingress.bind(Downstream.bind())
handle: DeploymentHandle = serve.run(app)
response = handle.remote("world")
assert response.result() == "Hello world!"
options(*, method_name: str | DEFAULT = DEFAULT.VALUE, multiplexed_model_id: str | DEFAULT = DEFAULT.VALUE, stream: bool | DEFAULT = DEFAULT.VALUE, use_new_handle_api: bool | DEFAULT = DEFAULT.VALUE, _prefer_local_routing: bool | DEFAULT = DEFAULT.VALUE, _by_reference: bool | DEFAULT = DEFAULT.VALUE, request_serialization: str | DEFAULT = DEFAULT.VALUE, response_serialization: str | DEFAULT = DEFAULT.VALUE) DeploymentHandle[T][source]#

Set options for this handle and return an updated copy of it.

Parameters:
  • method_name – The method name to call on the deployment.

  • multiplexed_model_id – The model ID to use for multiplexed model requests.

  • stream – Whether to use streaming for the request.

  • use_new_handle_api – Whether to use the new handle API.

  • _prefer_local_routing – Whether to prefer local routing.

  • _by_reference – Whether to use by reference.

  • request_serialization – Serialization method for RPC requests. Available options: “cloudpickle”, “pickle”, “msgpack”, “orjson”. Defaults to “cloudpickle”.

  • response_serialization – Serialization method for RPC responses. Available options: “cloudpickle”, “pickle”, “msgpack”, “orjson”. Defaults to “cloudpickle”.

Example:

response: DeploymentResponse = handle.options(
    method_name="other_method",
    multiplexed_model_id="model:v1",
).remote()
remote(*args, **kwargs) DeploymentResponse[Any] | DeploymentResponseGenerator[Any][source]#

Issue a remote call to a method of the deployment.

By default, the result is a DeploymentResponse that can be awaited to fetch the result of the call or passed to another .remote() call to compose multiple deployments.

If handle.options(stream=True) is set and a generator method is called, this returns a DeploymentResponseGenerator instead.

Example:

# Fetch the result directly.
response = handle.remote()
result = await response

# Pass the result to another handle call.
composed_response = handle2.remote(handle1.remote())
composed_result = await composed_response
Parameters:
  • *args – Positional arguments to be serialized and passed to the remote method call.

  • **kwargs – Keyword arguments to be serialized and passed to the remote method call.