ray.serve.llm.deployments.VLLMService.chat#

async VLLMService.chat(request: ChatCompletionRequest) → AsyncGenerator[ChatCompletionStreamResponse | ChatCompletionResponse | ErrorResponse, None][source]#

Runs a chat request to the vllm engine, and return the response.