ray.serve.config.AutoscalingContext#

class ray.serve.config.AutoscalingContext(deployment_id: DeploymentID, deployment_name: str, app_name: str | None, current_num_replicas: int, target_num_replicas: int, running_replicas: List[ReplicaID], total_num_requests: float, total_queued_requests: float | None, total_running_requests: float | None, aggregated_metrics: Dict[str, Dict[ReplicaID, float]], raw_metrics: Dict[str, Dict[ReplicaID, List[TimeStampedValue]]], capacity_adjusted_min_replicas: int, capacity_adjusted_max_replicas: int, policy_state: Dict[str, Any], last_scale_up_time: float | None, last_scale_down_time: float | None, current_time: float | None, config: Any | None)[source]#

Rich context provided to custom autoscaling policies.

This class provides comprehensive information about a deployment’s current state, metrics, and configuration that can be used by custom autoscaling policies to make intelligent scaling decisions.

The context includes deployment metadata, current replica state, built-in and custom metrics, capacity bounds, policy state, and timing information.

PublicAPI (alpha): This API is in alpha and may change before becoming stable.

Methods

Attributes

deployment_id

Unique identifier for the deployment.

deployment_name

Name of the deployment.

app_name

Name of the application containing this deployment.

current_num_replicas

Current number of running replicas.

target_num_replicas

Target number of replicas set by the autoscaler.

running_replicas

List of currently running replica IDs.

total_num_requests

Total number of requests across all replicas.

total_queued_requests

Number of requests currently queued.

total_running_requests

Total number of requests currently running.

aggregated_metrics

Time-weighted averages of custom metrics per replica.

raw_metrics

Raw custom metric timeseries per replica.

capacity_adjusted_min_replicas

Minimum replicas adjusted for cluster capacity.

capacity_adjusted_max_replicas

Maximum replicas adjusted for cluster capacity.

policy_state

Persistent state dictionary for the autoscaling policy.

last_scale_up_time

Timestamp of last scale-up action.

last_scale_down_time

Timestamp of last scale-down action.

current_time

Current timestamp.

config

Autoscaling configuration for this deployment.