ray.serve.llm.builders.build_openai_app#

ray.serve.llm.builders.build_openai_app(llm_serving_args: LLMServingArgs) Application[source]#

Helper to build an OpenAI compatible app with the llm deployment setup from the given llm serving args. This is the main entry point for users to create a Serve application serving LLMs.

Examples

from ray import serve
from ray.serve.llm.configs import LLMConfig
from ray.serve.llm.deployments import VLLMService, LLMRouter

llm_config1 = LLMConfig(
    model_loading_config=dict(
        model_id="qwen-0.5b",
        model_source="Qwen/Qwen2.5-0.5B-Instruct",
    ),
    deployment_config=dict(
        autoscaling_config=dict(
            min_replicas=1, max_replicas=2,
        )
    ),
    accelerator_type="A10G",
)

llm_config2 = LLMConfig(
    model_loading_config=dict(
        model_id="qwen-1.5b",
        model_source="Qwen/Qwen2.5-1.5B-Instruct",
    ),
    deployment_config=dict(
        autoscaling_config=dict(
            min_replicas=1, max_replicas=2,
        )
    ),
    accelerator_type="A10G",
)

# Deploy the application
deployment1 = VLLMService.as_deployment().bind(llm_config1)
deployment2 = VLLMService.as_deployment().bind(llm_config2)
llm_app = LLMRouter.as_deployment().bind([deployment1, deployment2])
serve.run(llm_app)


# Querying the model via openai client
from openai import OpenAI

# Initialize client
client = OpenAI(base_url="http://localhost:8000/v1", api_key="fake-key")

# Basic completion
response = client.chat.completions.create(
    model="qwen-0.5b",
    messages=[{"role": "user", "content": "Hello!"}]
)
Parameters:

llm_serving_args – The list of llm configs or the paths to the llm config to build the app.

Returns:

The configured Ray Serve Application router.

PublicAPI (alpha): This API is in alpha and may change before becoming stable.