ray.serve.deployment#

Decorator that converts a Python class to a Deployment.

Example:

from ray import serve

@serve.deployment(num_replicas=2)
class MyDeployment:
    pass

app = MyDeployment.bind()

Parameters:

_func_or_class – The class or function to be decorated.
name – Name uniquely identifying this deployment within the application. If not provided, the name of the class or function is used.
version – Removed. Specifying this argument raises a ValueError.
num_replicas – Number of replicas to run that handle requests to this deployment. Defaults to 1.
ray_actor_options – Options to pass to the Ray Actor decorator, such as resource requirements. Valid options are: accelerator_type, memory, num_cpus, num_gpus, resources, runtime_env, and label_selector.
placement_group_bundles – Defines a set of placement group bundles to be scheduled for each replica of this deployment. The replica actor will be scheduled in the first bundle provided, so the resources specified in ray_actor_options must be a subset of the first bundle’s resources. All actors and tasks created by the replica actor will be scheduled in the placement group by default (placement_group_capture_child_tasks is set to True). This cannot be set together with max_replicas_per_node.
placement_group_strategy – Strategy to use for the replica placement group specified via placement_group_bundles. Defaults to PACK.
placement_group_bundle_label_selector – A list of label selectors to apply to the placement group on a per-bundle level. If a single label selector is provided, it is applied to all bundles. Otherwise, the length must match placement_group_bundles.
max_replicas_per_node – The max number of replicas of this deployment that can run on a single node. Valid values are None (default, no limit) or an integer in the range of [1, 100]. This cannot be set together with placement_group_bundles.
user_config – Config to pass to the reconfigure method of the deployment. This can be updated dynamically without restarting the replicas of the deployment. The user_config must be fully JSON-serializable.
max_ongoing_requests – Maximum number of requests that are sent to a replica of this deployment without receiving a response. Defaults to 5.
max_queued_requests – Maximum number of requests to this deployment that will be queued at each caller (proxy or DeploymentHandle). Once this limit is reached, subsequent requests will raise a BackPressureError (for handles) or return an HTTP 503 status code (for HTTP requests). Defaults to -1 (no limit).
autoscaling_config – Parameters to configure autoscaling behavior. If this is set, num_replicas should be “auto” or not set.
graceful_shutdown_wait_loop_s – Duration that replicas wait until there is no more work to be done before shutting down. Defaults to 2s.
graceful_shutdown_timeout_s – Duration to wait for a replica to gracefully shut down before being forcefully killed. Defaults to 20s.
health_check_period_s – Duration between health check calls for the replica. Defaults to 10s. The health check is by default a no-op Actor call to the replica, but you can define your own health check using the “check_health” method in your deployment that raises an exception when unhealthy.
health_check_timeout_s – Duration in seconds, that replicas wait for a health check method to return before considering it as failed. Defaults to 30s.
logging_config – Logging config options for the deployment. If provided, the config will be used to set up the Serve logger on the deployment.
request_router_config – Config for the request router used for this deployment.
max_constructor_retry_count – Maximum number of times to retry the deployment constructor. Defaults to 20.
gang_scheduling_config – Configuration for gang scheduling of deployment replicas. Gang scheduling ensures that groups of replicas are scheduled together atomically, which is essential for distributed workloads that require coordination between replicas. See GangSchedulingConfig for options.
deployment_actors – List of deployment-scoped Ray actors managed by the controller. Each actor is shared across all replicas of this deployment. Use serve.get_deployment_actor(actor_name) from within a replica to get the actor handle. See DeploymentActorConfig for options.
rolling_update_percentage – The fraction of replicas to update at a time during a rolling update. Must be in (0.0, 1.0]. Defaults to 0.2 (20%).

Returns:

Deployment