ray.serve.config.AutoscalingContext#
- class ray.serve.config.AutoscalingContext(deployment_id: DeploymentID, deployment_name: str, app_name: str | None, current_num_replicas: int, target_num_replicas: int, running_replicas: List[ReplicaID], total_num_requests: float, total_queued_requests: float | None, total_running_requests: float | None, aggregated_metrics: Dict[str, Dict[ReplicaID, float]], raw_metrics: Dict[str, Dict[ReplicaID, List[TimeStampedValue]]], capacity_adjusted_min_replicas: int, capacity_adjusted_max_replicas: int, policy_state: Dict[str, Any], last_scale_up_time: float | None, last_scale_down_time: float | None, current_time: float | None, config: Any | None)[source]#
Rich context provided to custom autoscaling policies.
This class provides comprehensive information about a deployment’s current state, metrics, and configuration that can be used by custom autoscaling policies to make intelligent scaling decisions.
The context includes deployment metadata, current replica state, built-in and custom metrics, capacity bounds, policy state, and timing information.
PublicAPI (alpha): This API is in alpha and may change before becoming stable.
Methods
Attributes
Unique identifier for the deployment.
Name of the deployment.
Name of the application containing this deployment.
Current number of running replicas.
Target number of replicas set by the autoscaler.
List of currently running replica IDs.
Total number of requests across all replicas.
Number of requests currently queued.
Total number of requests currently running.
Time-weighted averages of custom metrics per replica.
Raw custom metric timeseries per replica.
Minimum replicas adjusted for cluster capacity.
Maximum replicas adjusted for cluster capacity.
Persistent state dictionary for the autoscaling policy.
Timestamp of last scale-up action.
Timestamp of last scale-down action.
Current timestamp.
Autoscaling configuration for this deployment.