ray.serve.request_router.RequestRouter#

class ray.serve.request_router.RequestRouter(deployment_id: DeploymentID, handle_source: DeploymentHandleSource, self_actor_id: str | None = None, self_actor_handle: ActorHandle | None = None, use_replica_queue_len_cache: bool = False, get_curr_time_s: Callable[[], float] | None = None, create_replica_wrapper_func: Callable[[RunningReplicaInfo], RunningReplica] | None = None, *args, **kwargs)[source]#

Bases: ABC

Abstract interface for a request router (how the router calls it).

PublicAPI (alpha): This API is in alpha and may change before becoming stable.

Methods

choose_replicas

Chooses a subset of candidate replicas from available replicas.

on_new_queue_len_info

Update queue length cache with new info received from replica.

on_replica_actor_died

Drop replica from replica set so it's not considered for future requests.

on_replica_actor_unavailable

Invalidate cache entry so active probing is required for the next request.

on_request_routed

Called when a request is routed to a replica.

select_available_replicas

Select available replicas from the list of candidates.

update_replicas

Update the set of available replicas to be considered for routing.

Attributes

app_name

Name of the app this router is serving.

backoff_sequence_s

The sequence of backoff timeouts to use when all replicas' queues are full.

curr_num_routing_tasks

Current number of routing tasks running.

curr_replicas

Current replicas available to be routed.

max_num_routing_tasks

Max number of routing tasks to run at any time.

max_num_routing_tasks_cap

Hard limit on the maximum number of routing tasks to run.

max_queue_len_response_deadline_s

Maximum deadline for receiving queue length info from replicas.

num_pending_requests

Current number of requests pending assignment.

queue_len_response_deadline_s

Deadline for receiving queue length info from replicas.

replica_queue_len_cache

Get the replica queue length cache.

target_num_routing_tasks

Target number of routing tasks to be running based on pending requests.