ray.serve.handle.DeploymentHandle#
- class ray.serve.handle.DeploymentHandle[source]#
A handle used to make requests to a deployment at runtime.
This is primarily used to compose multiple deployments within a single application. It can also be used to make calls to the ingress deployment of an application (e.g., for programmatic testing).
Example:
import ray from ray import serve from ray.serve.handle import DeploymentHandle, DeploymentResponse @serve.deployment class Downstream: def say_hi(self, message: str): return f"Hello {message}!" self._message = message @serve.deployment class Ingress: def __init__(self, handle: DeploymentHandle): self._downstream_handle = handle async def __call__(self, name: str) -> str: response = self._downstream_handle.say_hi.remote(name) return await response app = Ingress.bind(Downstream.bind()) handle: DeploymentHandle = serve.run(app) response = handle.remote("world") assert response.result() == "Hello world!"
- options(*, method_name: str | DEFAULT = DEFAULT.VALUE, multiplexed_model_id: str | DEFAULT = DEFAULT.VALUE, stream: bool | DEFAULT = DEFAULT.VALUE, use_new_handle_api: bool | DEFAULT = DEFAULT.VALUE, _prefer_local_routing: bool | DEFAULT = DEFAULT.VALUE, _by_reference: bool | DEFAULT = DEFAULT.VALUE, request_serialization: str | DEFAULT = DEFAULT.VALUE, response_serialization: str | DEFAULT = DEFAULT.VALUE) DeploymentHandle[T][source]#
Set options for this handle and return an updated copy of it.
- Parameters:
method_name – The method name to call on the deployment.
multiplexed_model_id – The model ID to use for multiplexed model requests.
stream – Whether to use streaming for the request.
use_new_handle_api – Whether to use the new handle API.
_prefer_local_routing – Whether to prefer local routing.
_by_reference – Whether to use by reference.
request_serialization – Serialization method for RPC requests. Available options: “cloudpickle”, “pickle”, “msgpack”, “orjson”. Defaults to “cloudpickle”.
response_serialization – Serialization method for RPC responses. Available options: “cloudpickle”, “pickle”, “msgpack”, “orjson”. Defaults to “cloudpickle”.
Example:
response: DeploymentResponse = handle.options( method_name="other_method", multiplexed_model_id="model:v1", ).remote()
- remote(*args, **kwargs) DeploymentResponse[Any] | DeploymentResponseGenerator[Any][source]#
Issue a remote call to a method of the deployment.
By default, the result is a
DeploymentResponsethat can be awaited to fetch the result of the call or passed to another.remote()call to compose multiple deployments.If
handle.options(stream=True)is set and a generator method is called, this returns aDeploymentResponseGeneratorinstead.Example:
# Fetch the result directly. response = handle.remote() result = await response # Pass the result to another handle call. composed_response = handle2.remote(handle1.remote()) composed_result = await composed_response
- Parameters:
*args – Positional arguments to be serialized and passed to the remote method call.
**kwargs – Keyword arguments to be serialized and passed to the remote method call.
- broadcast(method_name: str, *args, **kwargs) DeploymentBroadcastResponse[source]#
Call a method on all replicas of this deployment in parallel.
Unlike
remote(), which routes the request to a single replica via load balancing,broadcast()fans the call out to every running replica concurrently.This is useful for coordinated operations such as cache resets, configuration updates, or state synchronization across replicas.
Warning
broadcast()bypasses per-replica backpressure (max_queued_requestsis not enforced). It is intended for infrequent control-plane operations such as cache invalidation, configuration reload, or state synchronisation across replicas. Do not call it on the hot request path — doing so will send one request per replica on every call, with no rate limiting.Example:
handle = serve.get_deployment_handle("MyDeployment", "app") # Call reset_cache on every replica and collect results. response = handle.broadcast("reset_cache") results = response.results() # Pass arguments to the broadcast call. response = handle.broadcast("update_config", new_value=42) results = response.results()
- Parameters:
method_name – The name of the method to call on each replica.
*args – Positional arguments passed to the method.
**kwargs – Keyword arguments passed to the method.
- Returns:
A
DeploymentBroadcastResponsethat can be used to collect results from all replicas.