ray.serve.config.AutoscalingContext#

class ray.serve.config.AutoscalingContext(deployment_id: DeploymentID, deployment_name: str, app_name: str | None, current_num_replicas: int, target_num_replicas: int, running_replicas: List[ReplicaID], total_num_requests: float | Callable[[], float], total_queued_requests: float | Callable[[], float] | None, aggregated_metrics: Dict[str, Dict[ReplicaID, float]] | Callable[[], Dict[str, Dict[ReplicaID, float]]] | None, raw_metrics: Dict[str, Dict[ReplicaID, List[TimeStampedValue]]] | Callable[[], Dict[str, Dict[ReplicaID, List[TimeStampedValue]]]] | None, capacity_adjusted_min_replicas: int, capacity_adjusted_max_replicas: int, policy_state: Dict[str, Any], last_scale_up_time: float | None, last_scale_down_time: float | None, current_time: float | None, config: Any | None)[source]#

Rich context provided to custom autoscaling policies.

This class provides comprehensive information about a deployment’s current state, metrics, and configuration that can be used by custom autoscaling policies to make intelligent scaling decisions.

The context includes deployment metadata, current replica state, built-in and custom metrics, capacity bounds, policy state, and timing information.

Note: The aggregated_metrics and raw_metrics fields support lazy evaluation. You can pass callables that will be evaluated only when accessed, with results cached for subsequent accesses.

PublicAPI (alpha): This API is in alpha and may change before becoming stable.

Methods

Attributes

aggregated_metrics

raw_metrics

total_num_requests

total_queued_requests

total_running_requests

deployment_id

Unique identifier for the deployment.

deployment_name

Name of the deployment.

app_name

Name of the application containing this deployment.

capacity_adjusted_min_replicas

Minimum replicas adjusted for cluster capacity.

capacity_adjusted_max_replicas

Maximum replicas adjusted for cluster capacity.

current_time

Current timestamp.

config

Autoscaling configuration for this deployment.