ray.serve.llm.LLMServer.pause#

async LLMServer.pause(**kwargs: Any) None[source]#

Pause generation on the engine.

This halts generation requests while keeping model weights in GPU memory. New requests are blocked until resume is called.

Parameters:

**kwargs – Engine-specific pause options. Passed through to the engine.