ray.serve.request_router.RequestRouter.decrement_queue_len_cache#

RequestRouter.decrement_queue_len_cache(replica_id: ReplicaID)[source]#

Decrement the queue length cache for a replica.

Called via add_done_callback when a request finishes on a replica, regardless of outcome (success, failure, or cancellation). This is correct: any request that was actually sent occupies a queue slot, and the slot is freed when the request completes for any reason.

Should NOT be called for rejected requests — on_new_queue_len_info already corrects the cache with the replica’s actual queue length.