ray.serve.llm.deployments.VLLMService.completions#
- async VLLMService.completions(request: CompletionRequest) AsyncGenerator[CompletionStreamResponse | CompletionResponse | ErrorResponse, None] [source]#
Runs a completion request to the vllm engine, and return the response.
- Parameters:
request – A CompletionRequest object.
- Returns:
A LLMCompletionsResponse object.