ray.serve.llm.LLMServer.score#

async LLMServer.score(request: ScoreRequest, raw_request_info: RawRequestInfo | None = None) → AsyncGenerator[ScoreResponse | ErrorResponse, None][source]#

Runs a score request to the engine and returns the response.

Returns an AsyncGenerator over the ScoreResponse object. This is so that the caller can have a consistent interface across all the methods of chat, completions, embeddings, and score.