ray.serve.llm.build_openai_app#
- ray.serve.llm.build_openai_app(llm_serving_args: LLMServingArgs) Application [source]#
Helper to build an OpenAI compatible app with the llm deployment setup from the given llm serving args. This is the main entry point for users to create a Serve application serving LLMs.
Examples
Example usage in code.#from ray import serve from ray.serve.llm import LLMConfig, LLMServingArgs, build_openai_app llm_config1 = LLMConfig( model_loading_config=dict( model_id="qwen-0.5b", model_source="Qwen/Qwen2.5-0.5B-Instruct", ), deployment_config=dict( autoscaling_config=dict( min_replicas=1, max_replicas=2, ) ), accelerator_type="A10G", ) llm_config2 = LLMConfig( model_loading_config=dict( model_id="qwen-1.5b", model_source="Qwen/Qwen2.5-1.5B-Instruct", ), deployment_config=dict( autoscaling_config=dict( min_replicas=1, max_replicas=2, ) ), accelerator_type="A10G", ) # Deploy the application llm_app = build_openai_app( LLMServingArgs( llm_configs=[ llm_config1, llm_config2, ] ) ) serve.run(llm_app) # Querying the model via openai client from openai import OpenAI # Initialize client client = OpenAI(base_url="http://localhost:8000/v1", api_key="fake-key") # Basic completion response = client.chat.completions.create( model="qwen-0.5b", messages=[{"role": "user", "content": "Hello!"}] )
Example usage in YAML.## config.yaml applications: - args: llm_configs: - model_loading_config: model_id: qwen-0.5b model_source: Qwen/Qwen2.5-0.5B-Instruct accelerator_type: A10G deployment_config: autoscaling_config: min_replicas: 1 max_replicas: 2 - model_loading_config: model_id: qwen-1.5b model_source: Qwen/Qwen2.5-1.5B-Instruct accelerator_type: A10G deployment_config: autoscaling_config: min_replicas: 1 max_replicas: 2 import_path: ray.serve.llm:build_openai_app name: llm_app route_prefix: "/"
- Parameters:
llm_serving_args – The list of llm configs or the paths to the llm config to build the app.
- Returns:
The configured Ray Serve Application router.
PublicAPI (alpha): This API is in alpha and may change before becoming stable.