ray.serve.llm.LLMRouter.completions#

async LLMRouter.completions(body: CompletionRequest, request: fastapi.Request) → starlette.responses.Response#

Given a prompt, the model will return one or more predicted completions, and can also return the probabilities of alternative tokens at each position.

Parameters: