ray.serve.schema.DeploymentSchema#
- pydantic model ray.serve.schema.DeploymentSchema[source]#
- field autoscaling_config: Dict | AutoscalingConfig | None = DEFAULT.VALUE#
Config specifying autoscaling parameters for the deployment’s number of replicas. If null, the deployment won’t autoscale its number of replicas; the number of replicas will be fixed at num_replicas.
- field deployment_actors: List[Dict | DeploymentActorConfig] | None = DEFAULT.VALUE#
Deployment-scoped actors managed by the controller. Each actor is shared by all replicas and cleaned up on deployment deletion. Each item has name, actor_class (import path), init_kwargs, and actor_options.
- field gang_scheduling_config: Dict | GangSchedulingConfig | None = DEFAULT.VALUE#
Configuration for gang scheduling of deployment replicas. Gang scheduling ensures that groups of replicas are scheduled together atomically. Specify gang_size (required), and optionally gang_placement_strategy and runtime_failure_policy.
- field graceful_shutdown_timeout_s: float = DEFAULT.VALUE#
Serve controller waits for this duration before forcefully killing the replica for shutdown. Uses a default if null.
- Constraints:
ge = 0
- field graceful_shutdown_wait_loop_s: float = DEFAULT.VALUE#
Duration that deployment replicas will wait until there is no more work to be done before shutting down. Uses a default if null.
- Constraints:
ge = 0
- field health_check_period_s: float = DEFAULT.VALUE#
Frequency at which the controller will health check replicas. Uses a default if null.
- Constraints:
gt = 0
- field health_check_timeout_s: float = DEFAULT.VALUE#
Timeout that the controller will wait for a response from the replica’s health check before marking it unhealthy. Uses a default if null.
- Constraints:
gt = 0
- field logging_config: LoggingConfig = DEFAULT.VALUE#
Logging config for configuring serve deployment logs.
- field max_ongoing_requests: int = DEFAULT.VALUE#
Maximum number of requests that are sent in parallel to each replica of this deployment. The limit is enforced across all callers (HTTP requests or DeploymentHandles). Defaults to 5.
- Constraints:
gt = 0
- field max_queued_requests: int = DEFAULT.VALUE#
[DEPRECATED] The max number of requests that will be executed at once in each replica. Defaults to 5.
- Constraints:
strict = True
- field max_replicas_per_node: int = DEFAULT.VALUE#
The max number of replicas of this deployment that can run on a single Valid values are None (default, no limit) or an integer in the range of [1, 100].
- field num_replicas: int | str | None = DEFAULT.VALUE#
The number of processes that handle requests to this deployment. Uses a default if null. Can also be set to
autofor a default autoscaling configuration (experimental).
- field placement_group_bundle_label_selector: List[Dict[str, str]] = DEFAULT.VALUE#
A list of label selectors to apply to the placement group on a per-bundle level.
- field placement_group_bundles: List[Dict[str, float]] = DEFAULT.VALUE#
Define a set of placement group bundles to be scheduled for each replica of this deployment. The replica actor will be scheduled in the first bundle provided, so the resources specified in
ray_actor_optionsmust be a subset of the first bundle’s resources. All actors and tasks created by the replica actor will be scheduled in the placement group by default (placement_group_capture_child_tasksis set to True).
- field placement_group_strategy: str = DEFAULT.VALUE#
Strategy to use for the replica placement group specified via
placement_group_bundles. Defaults toPACK.
- field ray_actor_options: RayActorOptionsSchema = DEFAULT.VALUE#
Options set for each replica actor.
- field request_router_config: Dict | RequestRouterConfig = DEFAULT.VALUE#
Config for the request router used for this deployment.