ray.serve.autoscaling_policy.replica_queue_length_autoscaling_policy#

ray.serve.autoscaling_policy.replica_queue_length_autoscaling_policy(ctx: AutoscalingContext) Tuple[int | float, Dict[str, Any]][source]#

The default autoscaling policy based on basic thresholds for scaling. There is a minimum threshold for the average queue length in the cluster to scale up and a maximum threshold to scale down. Each period, a ‘scale up’ or ‘scale down’ decision is made. This decision must be made for a specified number of periods in a row before the number of replicas is actually scaled. See config options for more details. Assumes get_decision_num_replicas is called once every CONTROL_LOOP_PERIOD_S seconds.

PublicAPI (alpha): This API is in alpha and may change before becoming stable.