ray.serve.config.AutoscalingContext#
- class ray.serve.config.AutoscalingContext(deployment_id: DeploymentID, deployment_name: str, app_name: str | None, current_num_replicas: int, target_num_replicas: int, running_replicas: List[ReplicaID], total_num_requests: float | Callable[[], float], total_queued_requests: float | Callable[[], float] | None, aggregated_metrics: Dict[str, Dict[ReplicaID, float]] | Callable[[], Dict[str, Dict[ReplicaID, float]]] | None, raw_metrics: Dict[str, Dict[ReplicaID, List[TimeStampedValue]]] | Callable[[], Dict[str, Dict[ReplicaID, List[TimeStampedValue]]]] | None, capacity_adjusted_min_replicas: int, capacity_adjusted_max_replicas: int, policy_state: Dict[str, Any], last_scale_up_time: float | None, last_scale_down_time: float | None, current_time: float | None, config: Any | None)[source]#
Rich context provided to custom autoscaling policies.
This class provides comprehensive information about a deployment’s current state, metrics, and configuration that can be used by custom autoscaling policies to make intelligent scaling decisions.
The context includes deployment metadata, current replica state, built-in and custom metrics, capacity bounds, policy state, and timing information.
Note: The aggregated_metrics and raw_metrics fields support lazy evaluation. You can pass callables that will be evaluated only when accessed, with results cached for subsequent accesses.
PublicAPI (alpha): This API is in alpha and may change before becoming stable.
Methods
Attributes
Unique identifier for the deployment.
Name of the deployment.
Name of the application containing this deployment.
Minimum replicas adjusted for cluster capacity.
Maximum replicas adjusted for cluster capacity.
Current timestamp.
Autoscaling configuration for this deployment.