ray.serve.llm.LLMServer.embeddings#

async LLMServer.embeddings(request: EmbeddingRequest, raw_request_info: RawRequestInfo | None = None) AsyncGenerator[List[ErrorResponse] | EmbeddingResponse, None][source]#

Runs an embeddings request to the engine and returns the response.

Returns an AsyncGenerator over the EmbeddingResponse object. This is so that the caller can have a consistent interface across all the methods of chat, completions, embeddings and transcriptions.

Parameters:
  • request – An EmbeddingRequest object.

  • raw_request_info – Optional RawRequestInfo containing data from the original HTTP request.

Returns:

An AsyncGenerator over the EmbeddingResponse object.