ray.serve.llm.LLMRouter.chat#

async LLMRouter.chat(body: ChatCompletionRequest, request: fastapi.Request) starlette.responses.Response#

Given a prompt, the model will return one or more predicted completions, and can also return the probabilities of alternative tokens at each position.

Parameters:
  • body – The chat completion request.

  • request – The raw FastAPI request object.

Returns:

A response object with completions.