ray.serve.llm.LLMRouter.get_deployment_options#
- classmethod LLMRouter.get_deployment_options(llm_configs: List[LLMConfig] | None = None) Dict[str, Any]#
Get the deployment options for the ingress deployment.
If all models are configured with min_replicas=0 (scale-to-zero), the ingress will also be configured with min_replicas=0 so that the worker node/GPU instance can be fully released when idle.
- Parameters:
llm_configs – The LLM configs to infer the number of ingress replicas from.
- Returns:
A dictionary containing the deployment options for the ingress deployment.