ray.serve.config.AutoscalingContext#

class ray.serve.config.AutoscalingContext(deployment_id: DeploymentID, deployment_name: str, app_name: str | None, current_num_replicas: int, target_num_replicas: int, running_replicas: List[ReplicaID], total_num_requests: float | Callable[[], float], total_queued_requests: float | Callable[[], float] | None, aggregated_metrics: Dict[str, Dict[ReplicaID, float]] | Callable[[], Dict[str, Dict[ReplicaID, float]]] | None, raw_metrics: Dict[str, Dict[ReplicaID, List[TimeStampedValue]]] | Callable[[], Dict[str, Dict[ReplicaID, List[TimeStampedValue]]]] | None, capacity_adjusted_min_replicas: int, capacity_adjusted_max_replicas: int, policy_state: Dict[str, Any], last_scale_up_time: float | None, last_scale_down_time: float | None, current_time: float | None, config: Any | None)[source]#

Rich context provided to custom autoscaling policies.

This class provides comprehensive information about a deployment’s current state, metrics, and configuration that can be used by custom autoscaling policies to make intelligent scaling decisions.

The context includes deployment metadata, current replica state, built-in and custom metrics, capacity bounds, policy state, and timing information.

Note: The aggregated_metrics and raw_metrics fields support lazy evaluation. You can pass callables that will be evaluated only when accessed, with results cached for subsequent accesses.

PublicAPI (alpha): This API is in alpha and may change before becoming stable.

Methods

Attributes

`aggregated_metrics`
`raw_metrics`
`total_num_requests`
`total_queued_requests`
`total_running_requests`
`deployment_id`	Unique identifier for the deployment.
`deployment_name`	Name of the deployment.
`app_name`	Name of the application containing this deployment.
`capacity_adjusted_min_replicas`	Minimum replicas adjusted for cluster capacity.
`capacity_adjusted_max_replicas`	Maximum replicas adjusted for cluster capacity.
`current_time`	Current timestamp.
`config`	Autoscaling configuration for this deployment.