ray.serve.deployment
ray.serve.deployment#
- ray.serve.deployment(_func_or_class: Optional[Callable] = None, name: Union[ray.serve._private.utils.DEFAULT, str] = DEFAULT.VALUE, version: Union[ray.serve._private.utils.DEFAULT, str] = DEFAULT.VALUE, num_replicas: Optional[Union[ray.serve._private.utils.DEFAULT, int]] = DEFAULT.VALUE, init_args: Union[ray.serve._private.utils.DEFAULT, Tuple[Any]] = DEFAULT.VALUE, init_kwargs: Union[ray.serve._private.utils.DEFAULT, Dict[Any, Any]] = DEFAULT.VALUE, route_prefix: Optional[Union[ray.serve._private.utils.DEFAULT, str]] = DEFAULT.VALUE, ray_actor_options: Union[ray.serve._private.utils.DEFAULT, Dict] = DEFAULT.VALUE, placement_group_bundles: Optional[List[Dict[str, float]]] = DEFAULT.VALUE, placement_group_strategy: Optional[str] = DEFAULT.VALUE, max_replicas_per_node: Union[ray.serve._private.utils.DEFAULT, int] = DEFAULT.VALUE, user_config: Optional[Union[ray.serve._private.utils.DEFAULT, Any]] = DEFAULT.VALUE, max_concurrent_queries: Union[ray.serve._private.utils.DEFAULT, int] = DEFAULT.VALUE, autoscaling_config: Optional[Union[ray.serve._private.utils.DEFAULT, Dict, ray.serve.config.AutoscalingConfig]] = DEFAULT.VALUE, graceful_shutdown_wait_loop_s: Union[ray.serve._private.utils.DEFAULT, float] = DEFAULT.VALUE, graceful_shutdown_timeout_s: Union[ray.serve._private.utils.DEFAULT, float] = DEFAULT.VALUE, health_check_period_s: Union[ray.serve._private.utils.DEFAULT, float] = DEFAULT.VALUE, health_check_timeout_s: Union[ray.serve._private.utils.DEFAULT, float] = DEFAULT.VALUE) Callable[[Callable], ray.serve.deployment.Deployment] [source]#
Decorator that converts a Python class to a
Deployment
.Example:
from ray import serve @serve.deployment(num_replicas=2) class MyDeployment: pass app = MyDeployment.bind()
- Parameters
name – Name uniquely identifying this deployment within the application. If not provided, the name of the class or function is used.
num_replicas – Number of replicas to run that handle requests to this deployment. Defaults to 1.
autoscaling_config – Parameters to configure autoscaling behavior. If this is set,
num_replicas
cannot be set.init_args – [DEPRECATED] These should be passed to
bind()
instead.init_kwargs – [DEPRECATED] These should be passed to
bind()
instead.route_prefix – [DEPRECATED] Route prefix should be set per-application through
serve.run()
.ray_actor_options – Options to pass to the Ray Actor decorator, such as resource requirements. Valid options are:
accelerator_type
,memory
,num_cpus
,num_gpus
,object_store_memory
,resources
, andruntime_env
.placement_group_bundles – Defines a set of placement group bundles to be scheduled for each replica of this deployment. The replica actor will be scheduled in the first bundle provided, so the resources specified in
ray_actor_options
must be a subset of the first bundle’s resources. All actors and tasks created by the replica actor will be scheduled in the placement group by default (placement_group_capture_child_tasks
is set to True).placement_group_strategy – Strategy to use for the replica placement group specified via
placement_group_bundles
. Defaults toPACK
.user_config – Config to pass to the reconfigure method of the deployment. This can be updated dynamically without restarting the replicas of the deployment. The user_config must be fully JSON-serializable.
max_concurrent_queries – Maximum number of queries that are sent to a replica of this deployment without receiving a response. Defaults to 100.
health_check_period_s – Duration between health check calls for the replica. Defaults to 10s. The health check is by default a no-op Actor call to the replica, but you can define your own health check using the “check_health” method in your deployment that raises an exception when unhealthy.
health_check_timeout_s – Duration in seconds, that replicas wait for a health check method to return before considering it as failed. Defaults to 30s.
graceful_shutdown_wait_loop_s – Duration that replicas wait until there is no more work to be done before shutting down. Defaults to 2s.
graceful_shutdown_timeout_s – Duration to wait for a replica to gracefully shut down before being forcefully killed. Defaults to 20s.
max_replicas_per_node – [EXPERIMENTAL] The max number of deployment replicas can run on a single node. Valid values are None (no limitation) or an integer in the range of [1, 100]. Defaults to no limitation.
- Returns