ray.serve.autoscaling_policy.replica_queue_length_autoscaling_policy#
- ray.serve.autoscaling_policy.replica_queue_length_autoscaling_policy(ctx: AutoscalingContext) Tuple[int | float, Dict[str, Any]][source]#
The default autoscaling policy based on basic thresholds for scaling. There is a minimum threshold for the average queue length in the cluster to scale up and a maximum threshold to scale down. Each period, a ‘scale up’ or ‘scale down’ decision is made. This decision must be made for a specified number of periods in a row before the number of replicas is actually scaled. See config options for more details. Assumes
get_decision_num_replicasis called once every CONTROL_LOOP_PERIOD_S seconds.PublicAPI (alpha): This API is in alpha and may change before becoming stable.