ray.serve.config.AutoscalingContext#
- class ray.serve.config.AutoscalingContext(deployment_id: DeploymentID, deployment_name: str, app_name: str | None, current_num_replicas: int, target_num_replicas: int, running_replicas: List[ReplicaID], total_num_requests: float | Callable[[], float], total_queued_requests: float | Callable[[], float] | None, aggregated_metrics: Dict[str, Dict[ReplicaID, float]] | Callable[[], Dict[str, Dict[ReplicaID, float]]] | None, raw_metrics: Dict[str, Dict[ReplicaID, List[TimeStampedValue]]] | Callable[[], Dict[str, Dict[ReplicaID, List[TimeStampedValue]]]] | None, capacity_adjusted_min_replicas: int, capacity_adjusted_max_replicas: int, policy_state: Dict[str, Any], last_scale_up_time: float | None, last_scale_down_time: float | None, current_time: float | None, config: Any | None, total_pending_async_requests: int)[source]#
Rich context provided to custom autoscaling policies.
This class provides comprehensive information about a deployment’s current state, metrics, and configuration that can be used by custom autoscaling policies to make intelligent scaling decisions.
The context includes deployment metadata, current replica state, built-in and custom metrics, capacity bounds, policy state, and timing information.
Note: The aggregated_metrics and raw_metrics fields support lazy evaluation. You can pass callables that will be evaluated only when accessed, with results cached for subsequent accesses.
PublicAPI (alpha): This API is in alpha and may change before becoming stable.
- deployment_id#
Unique identifier for the deployment.
- deployment_name#
Name of the deployment.
- app_name#
Name of the application containing this deployment.
- capacity_adjusted_min_replicas#
Minimum replicas adjusted for cluster capacity.
- capacity_adjusted_max_replicas#
Maximum replicas adjusted for cluster capacity.
- current_time#
Current timestamp.
- config#
Autoscaling configuration for this deployment.