ray.serve.llm.deployments.VLLMService.completions#

async VLLMService.completions(request: CompletionRequest) AsyncGenerator[CompletionStreamResponse | CompletionResponse | ErrorResponse, None][source]#

Runs a completion request to the vllm engine, and return the response.

Parameters:

request – A CompletionRequest object.

Returns:

A LLMCompletionsResponse object.