ray.serve.request_router.RunningReplica.reserve_slot#
- async RunningReplica.reserve_slot(request_metadata: RequestMetadata) Tuple[str, ReplicaQueueLengthInfo][source]#
Reserve a slot on this replica for an upcoming request.
Returns a unique token that can be used to release the slot later. This is used in the choose_replica/dispatch pattern to track reservations that haven’t been dispatched yet.