ray.serve.config.AutoscalingConfig#
- pydantic model ray.serve.config.AutoscalingConfig[source]#
Config for the Serve Autoscaler.
This class configures how Ray Serve scales a deployment’s replicas up and down in response to traffic. The autoscaler periodically aggregates request metrics over a look-back window, compares them to
target_ongoing_requests, and adjusts the replica count betweenmin_replicasandmax_replicas.upscale_delay_sanddownscale_delay_scontrol how quickly the autoscaler reacts to traffic changes, whileupscaling_factoranddownscaling_factordampen the magnitude of each scaling decision.For an end-to-end guide, see the Serve autoscaling guide and the advanced autoscaling guide.
- field aggregation_function: str | AggregationFunction = AggregationFunction.MEAN#
Function used to aggregate metrics across a time window.
- field downscale_delay_s: float = 600.0#
How long to wait before scaling down replicas to a value greater than 0.
- Constraints:
ge = 0
- field downscale_smoothing_factor: float | None = None#
[DEPRECATED] Please use
downscaling_factorinstead.
- field downscale_to_zero_delay_s: float | None = None#
How long to wait before scaling down replicas from 1 to 0. If not set, the value of
downscale_delay_swill be used.
- field downscaling_factor: float | None = None#
Multiplicative “gain” factor to limit downscaling decisions.
- field initial_replicas: int | None = None#
The number of replicas started when the deployment is first deployed. If not set, defaults to the value of
min_replicas.
- field look_back_period_s: float = 30.0#
Time window to average over for metrics.
- Constraints:
gt = 0
- field max_replicas: int = 1#
The maximum number of replicas for the deployment. Must be greater than or equal to
min_replicas. Ray Serve relies on the Ray Autoscaler to add cluster nodes when existing nodes lack the resources (CPUs, GPUs, etc.) needed to schedule additional replicas.- Constraints:
gt = 0
- field metrics_interval_s: float = 10.0#
[DEPRECATED] How often to scrape for metrics. Will be replaced by the environment variables
RAY_SERVE_REPLICA_AUTOSCALING_METRIC_PUSH_INTERVAL_SandRAY_SERVE_HANDLE_AUTOSCALING_METRIC_PUSH_INTERVAL_Sin a future release.- Constraints:
gt = 0
- field min_replicas: int = 1#
The minimum number of replicas for the deployment. Set this to a positive value to keep replicas ready for traffic at all times, or set it to 0 to allow scaling to zero when there is no traffic. Scaling to zero reduces cost but introduces cold-start latency when traffic resumes.
- Constraints:
ge = 0
- field policy: AutoscalingPolicy [Optional]#
The autoscaling policy for the deployment.
- field smoothing_factor: float = 1.0#
[DEPRECATED] Smoothing factor for autoscaling decisions.
- Constraints:
gt = 0
- field target_ongoing_requests: float | None = 2#
The target number of requests being processed and queued per replica. Serve scales the replica count up or down to keep each replica close to this value. Lower values reduce per-replica load and tail latency at the cost of running more replicas; higher values pack more traffic onto each replica. Defaults to 2.
- field upscale_delay_s: float = 30.0#
How long to wait before scaling up replicas.
- Constraints:
ge = 0
- field upscale_smoothing_factor: float | None = None#
[DEPRECATED] Please use
upscaling_factorinstead.
- field upscaling_factor: float | None = None#
Multiplicative “gain” factor to limit upscaling decisions.
- validator aggregation_function_valid » aggregation_function[source]#
- validator look_back_period_s_valid » look_back_period_s[source]#
- validator metrics_interval_s_deprecation_warning » metrics_interval_s[source]#