ray.serve.request_router.MultiplexMixin.apply_multiplex_routing#
- MultiplexMixin.apply_multiplex_routing(pending_request: PendingRequest | None = None) Set[ReplicaID] [source]#
Apply multiplex routing to the pending request.
When the request is None, return all replicas. Each call will try to route the request to the replicas that have the multiplexed model ID to the hierarchy of first the replicas with the multiplexed model ID, then the replicas with the fewest multiplexed models, and finally all replicas.
- Parameters:
pending_request – The pending request to be routed based on multiplexed model policy.
- Returns:
A set of replica IDs that are candidates for the existing routing call.