ray.serve.llm.LLMServer#

class ray.serve.llm.LLMServer(**kwargs)[source]#

Bases: LLMServer

Methods

`chat`	Runs a chat request to the LLM engine and returns the response.
`check_health`	Check the health of the replica.
`completions`	Runs a completion request to the LLM engine and returns the response.
`embeddings`	Runs an embeddings request to the engine and returns the response.
`reset_prefix_cache`	Reset the prefix cache of the underlying engine
`score`	Runs a score request to the engine and returns the response.
`start`	Start the underlying engine.
`start_profile`	Start profiling
`stop_profile`	Stop profiling
`sync_init`	Synchronous constructor that returns an unstarted instance.
`transcriptions`	Runs an transcriptions request to the engine and returns the response.