ray.serve.llm.deployments.VLLMService.chat#

async VLLMService.chat(request: ChatCompletionRequest) AsyncGenerator[ChatCompletionStreamResponse | ChatCompletionResponse | ErrorResponse, None][source]#

Runs a chat request to the vllm engine, and return the response.

Parameters:

request – A ChatCompletionRequest object.

Returns:

A LLMChatResponse object.