ray.serve.llm.LLMServer.pause# async LLMServer.pause(**kwargs: Any) → None[source]# Pause generation on the engine. This halts generation requests while keeping model weights in GPU memory. New requests are blocked until resume is called. Parameters: **kwargs – Engine-specific pause options. Passed through to the engine.