ray.serve.llm.builders.build_openai_app#
- ray.serve.llm.builders.build_openai_app(llm_serving_args: LLMServingArgs) Application [source]#
Helper to build an OpenAI compatible app with the llm deployment setup from the given llm serving args. This is the main entry point for users to create a Serve application serving LLMs.
Examples
from ray import serve from ray.serve.llm.configs import LLMConfig from ray.serve.llm.deployments import VLLMService, LLMRouter llm_config1 = LLMConfig( model_loading_config=dict( model_id="qwen-0.5b", model_source="Qwen/Qwen2.5-0.5B-Instruct", ), deployment_config=dict( autoscaling_config=dict( min_replicas=1, max_replicas=2, ) ), accelerator_type="A10G", ) llm_config2 = LLMConfig( model_loading_config=dict( model_id="qwen-1.5b", model_source="Qwen/Qwen2.5-1.5B-Instruct", ), deployment_config=dict( autoscaling_config=dict( min_replicas=1, max_replicas=2, ) ), accelerator_type="A10G", ) # Deploy the application deployment1 = VLLMService.as_deployment().bind(llm_config1) deployment2 = VLLMService.as_deployment().bind(llm_config2) llm_app = LLMRouter.as_deployment().bind([deployment1, deployment2]) serve.run(llm_app) # Querying the model via openai client from openai import OpenAI # Initialize client client = OpenAI(base_url="http://localhost:8000/v1", api_key="fake-key") # Basic completion response = client.chat.completions.create( model="qwen-0.5b", messages=[{"role": "user", "content": "Hello!"}] )
- Parameters:
llm_serving_args – The list of llm configs or the paths to the llm config to build the app.
- Returns:
The configured Ray Serve Application router.
PublicAPI (alpha): This API is in alpha and may change before becoming stable.