ray.serve.llm.LLMServer.embeddings#
- async LLMServer.embeddings(request: EmbeddingCompletionRequest) AsyncGenerator[EmbeddingResponse | ErrorResponse, None] [source]#
Runs an embeddings request to the vllm engine, and return the response.
- Parameters:
request – An EmbeddingRequest object.
- Returns:
A LLMEmbeddingsResponse object.