ray.serve.request_router.RequestRouter#

class ray.serve.request_router.RequestRouter(deployment_id: DeploymentID, handle_source: DeploymentHandleSource, self_actor_id: str | None = None, self_actor_handle: ActorHandle | None = None, use_replica_queue_len_cache: bool = False, get_curr_time_s: Callable[[], float] | None = None, create_replica_wrapper_func: Callable[[RunningReplicaInfo], RunningReplica] | None = None, *args, **kwargs)[source]#

Bases: ABC

Abstract interface for a request router (how the router calls it).

PublicAPI (alpha): This API is in alpha and may change before becoming stable.

Methods

`choose_replicas`	Chooses a subset of candidate replicas from available replicas.
`on_new_queue_len_info`	Update queue length cache with new info received from replica.
`on_replica_actor_died`	Drop replica from replica set so it's not considered for future requests.
`on_replica_actor_unavailable`	Invalidate cache entry so active probing is required for the next request.
`on_request_routed`	Called when a request is routed to a replica.
`select_available_replicas`	Select available replicas from the list of candidates.
`update_replicas`	Update the set of available replicas to be considered for routing.

Attributes

`app_name`	Name of the app this router is serving.
`backoff_sequence_s`	The sequence of backoff timeouts to use when all replicas' queues are full.
`curr_num_routing_tasks`	Current number of routing tasks running.
`curr_replicas`	Current replicas available to be routed.
`max_num_routing_tasks`	Max number of routing tasks to run at any time.
`max_num_routing_tasks_cap`	Hard limit on the maximum number of routing tasks to run.
`max_queue_len_response_deadline_s`	Maximum deadline for receiving queue length info from replicas.
`num_pending_requests`	Current number of requests pending assignment.
`queue_len_response_deadline_s`	Deadline for receiving queue length info from replicas.
`replica_queue_len_cache`	Get the replica queue length cache.
`target_num_routing_tasks`	Target number of routing tasks to be running based on pending requests.