ray.serve.llm.deployments.VLLMService.chat#
- async VLLMService.chat(request: ChatCompletionRequest) AsyncGenerator[ChatCompletionStreamResponse | ChatCompletionResponse | ErrorResponse, None] [source]#
Runs a chat request to the vllm engine, and return the response.
- Parameters:
request – A ChatCompletionRequest object.
- Returns:
A LLMChatResponse object.