ray.serve.deployment#

ray.serve.deployment(_func_or_class: Callable | None = None, name: DEFAULT | str = DEFAULT.VALUE, version: DEFAULT | str = DEFAULT.VALUE, num_replicas: DEFAULT | int | str | None = DEFAULT.VALUE, route_prefix: DEFAULT | str | None = DEFAULT.VALUE, ray_actor_options: DEFAULT | Dict = DEFAULT.VALUE, placement_group_bundles: DEFAULT | List[Dict[str, float]] = DEFAULT.VALUE, placement_group_strategy: DEFAULT | str = DEFAULT.VALUE, max_replicas_per_node: DEFAULT | int = DEFAULT.VALUE, user_config: DEFAULT | Any | None = DEFAULT.VALUE, max_concurrent_queries: DEFAULT | int = DEFAULT.VALUE, max_ongoing_requests: DEFAULT | int = DEFAULT.VALUE, max_queued_requests: DEFAULT | int = DEFAULT.VALUE, autoscaling_config: DEFAULT | Dict | AutoscalingConfig | None = DEFAULT.VALUE, graceful_shutdown_wait_loop_s: DEFAULT | float = DEFAULT.VALUE, graceful_shutdown_timeout_s: DEFAULT | float = DEFAULT.VALUE, health_check_period_s: DEFAULT | float = DEFAULT.VALUE, health_check_timeout_s: DEFAULT | float = DEFAULT.VALUE, logging_config: DEFAULT | Dict | LoggingConfig | None = DEFAULT.VALUE) Callable[[Callable], Deployment][source]#

Decorator that converts a Python class to a Deployment.

Example:

from ray import serve

@serve.deployment(num_replicas=2)
class MyDeployment:
    pass

app = MyDeployment.bind()
Parameters:
  • name – Name uniquely identifying this deployment within the application. If not provided, the name of the class or function is used.

  • num_replicas – Number of replicas to run that handle requests to this deployment. Defaults to 1.

  • autoscaling_config – Parameters to configure autoscaling behavior. If this is set, num_replicas cannot be set.

  • route_prefix – [DEPRECATED] Route prefix should be set per-application through serve.run() or the config file.

  • ray_actor_options – Options to pass to the Ray Actor decorator, such as resource requirements. Valid options are: accelerator_type, memory, num_cpus, num_gpus, object_store_memory, resources, and runtime_env.

  • placement_group_bundles – Defines a set of placement group bundles to be scheduled for each replica of this deployment. The replica actor will be scheduled in the first bundle provided, so the resources specified in ray_actor_options must be a subset of the first bundle’s resources. All actors and tasks created by the replica actor will be scheduled in the placement group by default (placement_group_capture_child_tasks is set to True). This cannot be set together with max_replicas_per_node.

  • placement_group_strategy – Strategy to use for the replica placement group specified via placement_group_bundles. Defaults to PACK.

  • user_config – Config to pass to the reconfigure method of the deployment. This can be updated dynamically without restarting the replicas of the deployment. The user_config must be fully JSON-serializable.

  • max_concurrent_queries – [DEPRECATED] Maximum number of queries that are sent to a replica of this deployment without receiving a response. Defaults to 100.

  • max_ongoing_requests – Maximum number of requests that are sent to a replica of this deployment without receiving a response. Defaults to 100.

  • max_queued_requests – [EXPERIMENTAL] Maximum number of requests to this deployment that will be queued at each caller (proxy or DeploymentHandle). Once this limit is reached, subsequent requests will raise a BackPressureError (for handles) or return an HTTP 503 status code (for HTTP requests). Defaults to -1 (no limit).

  • health_check_period_s – Duration between health check calls for the replica. Defaults to 10s. The health check is by default a no-op Actor call to the replica, but you can define your own health check using the “check_health” method in your deployment that raises an exception when unhealthy.

  • health_check_timeout_s – Duration in seconds, that replicas wait for a health check method to return before considering it as failed. Defaults to 30s.

  • graceful_shutdown_wait_loop_s – Duration that replicas wait until there is no more work to be done before shutting down. Defaults to 2s.

  • graceful_shutdown_timeout_s – Duration to wait for a replica to gracefully shut down before being forcefully killed. Defaults to 20s.

  • max_replicas_per_node – The max number of replicas of this deployment that can run on a single node. Valid values are None (default, no limit) or an integer in the range of [1, 100]. This cannot be set together with placement_group_bundles.

Returns:

Deployment