ray.serve.request_router.RequestRouter#
- class ray.serve.request_router.RequestRouter(deployment_id: DeploymentID, handle_source: DeploymentHandleSource, self_actor_id: str | None = None, self_actor_handle: ActorHandle | None = None, use_replica_queue_len_cache: bool = False, get_curr_time_s: Callable[[], float] | None = None, create_replica_wrapper_func: Callable[[RunningReplicaInfo], RunningReplica] | None = None, *args, **kwargs)[source]#
Bases:
ABC
Abstract interface for a request router (how the router calls it).
PublicAPI (alpha): This API is in alpha and may change before becoming stable.
Methods
Chooses a subset of candidate replicas from available replicas.
Update queue length cache with new info received from replica.
Drop replica from replica set so it's not considered for future requests.
Invalidate cache entry so active probing is required for the next request.
Called when a request is routed to a replica.
Select available replicas from the list of candidates.
Update the set of available replicas to be considered for routing.
Attributes
Name of the app this router is serving.
The sequence of backoff timeouts to use when all replicas' queues are full.
Current number of routing tasks running.
Current replicas available to be routed.
Max number of routing tasks to run at any time.
Hard limit on the maximum number of routing tasks to run.
Maximum deadline for receiving queue length info from replicas.
Current number of requests pending assignment.
Deadline for receiving queue length info from replicas.
Get the replica queue length cache.
Target number of routing tasks to be running based on pending requests.