ray.serve.llm.LLMServer.embeddings#

async LLMServer.embeddings(request: EmbeddingCompletionRequest) AsyncGenerator[EmbeddingResponse | ErrorResponse, None][source]#

Runs an embeddings request to the vllm engine, and return the response.

Parameters:

request – An EmbeddingRequest object.

Returns:

A LLMEmbeddingsResponse object.