ray.serve.llm.LLMServer.completions#
- async LLMServer.completions(request: CompletionRequest) AsyncGenerator[CompletionStreamResponse | CompletionResponse | ErrorResponse, None] [source]#
Runs a completion request to the vllm engine, and return the response.
- Parameters:
request – A CompletionRequest object.
- Returns:
A LLMCompletionsResponse object.