ray.serve.llm.LLMServer#
- class ray.serve.llm.LLMServer(**kwargs)[source]#
Bases:
LLMServer
Methods
Convert the LLMServer to a Ray Serve deployment.
Runs a chat request to the LLM engine and returns the response.
Check the health of the replica.
Runs a completion request to the LLM engine and returns the response.
Runs an embeddings request to the engine and returns the response.
Start the underlying engine.
Synchronous constructor that returns an unstarted instance.