ray.serve.llm.LLMServer#

class ray.serve.llm.LLMServer(**kwargs)[source]#

Bases: LLMServer

Methods

as_deployment

Convert the LLMServer to a Ray Serve deployment.

chat

Runs a chat request to the LLM engine and returns the response.

check_health

Check the health of the replica.

completions

Runs a completion request to the LLM engine and returns the response.

embeddings

Runs an embeddings request to the engine and returns the response.

start

Start the underlying engine.

sync_init

Synchronous constructor that returns an unstarted instance.