ray.serve.config.RequestRouterConfig#
- pydantic model ray.serve.config.RequestRouterConfig[source]#
Config for the Serve request router.
This class configures how Ray Serve routes requests to deployment replicas. The router is responsible for selecting which replica should handle each incoming request based on the configured routing policy. You can customize the routing behavior by specifying a custom request router class and providing configuration parameters.
The router also manages periodic health checks and scheduling statistics collection from replicas to make informed routing decisions.
Example
from ray.serve.config import RequestRouterConfig, DeploymentConfig from ray import serve # Use default router with custom stats collection interval request_router_config = RequestRouterConfig( request_routing_stats_period_s=5.0, request_routing_stats_timeout_s=15.0 ) # Use custom router class request_router_config = RequestRouterConfig( request_router_class="ray.serve.llm.request_router.PrefixCacheAffinityRouter", request_router_kwargs={"imbalanced_threshold": 20} ) deployment_config = DeploymentConfig( request_router_config=request_router_config ) deployment = serve.deploy( "my_deployment", deployment_config=deployment_config )
PublicAPI (alpha): This API is in alpha and may change before becoming stable.
- field backoff_multiplier: float = 2#
Multiplier applied to the backoff time after each retry. Defaults to 2.
- Constraints:
gt = 0
- field initial_backoff_s: float = 0.025#
Initial backoff time (in seconds) before retrying to route a request to a replica. Defaults to 0.025.
- Constraints:
gt = 0
- field max_backoff_s: float = 0.5#
Maximum backoff time (in seconds) between retries. Defaults to 0.5.
- Constraints:
gt = 0
- field request_router_class: str | Callable = 'ray.serve._private.request_router:PowerOfTwoChoicesRequestRouter'#
The class of the request router that Ray Serve uses for this deployment. This value can be a string or a class. All the deployment handles that you create for this deployment use the routing policy defined by the request router. Default to Serve’s PowerOfTwoChoicesRequestRouter.
- field request_router_kwargs: Dict[str, Any] [Optional]#
Keyword arguments that Ray Serve passes to the request router class initialize_state method.
- field request_routing_stats_period_s: float = 10#
Duration between record scheduling stats calls for the replica. Defaults to 10s. The health check is by default a no-op Actor call to the replica, but you can define your own request scheduling stats using the ‘record_scheduling_stats’ method in your deployment.
- Constraints:
gt = 0
- field request_routing_stats_timeout_s: float = 30#
Duration in seconds, that replicas wait for a request scheduling stats method to return before considering it as failed. Defaults to 30s.
- Constraints:
gt = 0
- classmethod from_serialized_request_router_cls(request_router_config: dict, serialized_request_router_cls: bytes) RequestRouterConfig[source]#
- get_request_router_class() Callable[source]#
Deserialize the request router from cloudpickled bytes.
- validator request_router_kwargs_json_serializable » request_router_kwargs[source]#