Ray Serve API

Core APIs

ray.serve.run(target: Union[ray.dag.class_node.ClassNode, ray.dag.function_node.FunctionNode], _blocking: bool = True, host: str = '127.0.0.1', port: int = 8000) Optional[ray.serve.handle.RayServeHandle][source]

Run a Serve application and return a ServeHandle to the ingress.

Either a ClassNode, FunctionNode, or a pre-built application can be passed in. If a node is passed in, all of the deployments it depends on will be deployed. If there is an ingress, its handle will be returned.

Parameters
  • target (Union[ClassNode, FunctionNode, Application]) – A user-built Serve Application or a ClassNode that acts as the root node of DAG. By default ClassNode is the Driver deployment unless user provides a customized one.

  • host – Host for HTTP servers to listen on. Defaults to “127.0.0.1”. To expose Serve publicly, you probably want to set this to “0.0.0.0”.

  • port – Port for HTTP server. Defaults to 8000.

Returns

A regular ray serve handle that can be called by user

to execute the serve DAG.

Return type

RayServeHandle

PublicAPI (beta): This API is in beta and may change before becoming stable.

ray.serve.start(detached: bool = False, http_options: Optional[Union[dict, ray.serve.config.HTTPOptions]] = None, dedicated_cpu: bool = False, **kwargs) ray.serve._private.client.ServeControllerClient[source]

Initialize a serve instance.

By default, the instance will be scoped to the lifetime of the returned Client object (or when the script exits). If detached is set to True, the instance will instead persist until serve.shutdown() is called. This is only relevant if connecting to a long-running Ray cluster (e.g., with ray.init(address=”auto”) or ray.init(“ray://<remote_addr>”)).

Parameters
  • detached – Whether not the instance should be detached from this script. If set, the instance will live on the Ray cluster until it is explicitly stopped with serve.shutdown().

  • http_options (Optional[Dict, serve.HTTPOptions]) –

    Configuration options for HTTP proxy. You can pass in a dictionary or HTTPOptions object with fields:

    • host: Host for HTTP servers to listen on. Defaults to “127.0.0.1”. To expose Serve publicly, you probably want to set this to “0.0.0.0”.

    • port: Port for HTTP server. Defaults to 8000.

    • root_path: Root path to mount the serve application (for example, “/serve”). All deployment routes will be prefixed with this path. Defaults to “”.

    • middlewares: A list of Starlette middlewares that will be applied to the HTTP servers in the cluster. Defaults to [].

    • location(str, serve.config.DeploymentMode): The deployment location of HTTP servers:

      • ”HeadOnly”: start one HTTP server on the head node. Serve assumes the head node is the node you executed serve.start on. This is the default.

      • ”EveryNode”: start one HTTP server per node.

      • ”NoServer” or None: disable HTTP server.

    • num_cpus: The number of CPU cores to reserve for each internal Serve HTTP proxy actor. Defaults to 0.

  • dedicated_cpu – Whether to reserve a CPU core for the internal Serve controller actor. Defaults to False.

Warning

DEPRECATED: This API is deprecated and may be removed in a future Ray release. See https://docs.ray.io/en/latest/serve/index.html for more information.

ray.serve.deployment(func_or_class: Callable) ray.serve.deployment.Deployment[source]
ray.serve.deployment(name: Union[ray.serve._private.utils.DEFAULT, str] = DEFAULT.VALUE, version: Union[ray.serve._private.utils.DEFAULT, str] = DEFAULT.VALUE, num_replicas: Union[ray.serve._private.utils.DEFAULT, int] = DEFAULT.VALUE, init_args: Union[ray.serve._private.utils.DEFAULT, Tuple[Any]] = DEFAULT.VALUE, init_kwargs: Union[ray.serve._private.utils.DEFAULT, Dict[Any, Any]] = DEFAULT.VALUE, route_prefix: Optional[Union[ray.serve._private.utils.DEFAULT, str]] = DEFAULT.VALUE, ray_actor_options: Union[ray.serve._private.utils.DEFAULT, Dict] = DEFAULT.VALUE, user_config: Union[ray.serve._private.utils.DEFAULT, Any] = DEFAULT.VALUE, max_concurrent_queries: Union[ray.serve._private.utils.DEFAULT, int] = DEFAULT.VALUE, autoscaling_config: Union[ray.serve._private.utils.DEFAULT, Dict, ray.serve.config.AutoscalingConfig] = DEFAULT.VALUE, graceful_shutdown_wait_loop_s: Union[ray.serve._private.utils.DEFAULT, float] = DEFAULT.VALUE, graceful_shutdown_timeout_s: Union[ray.serve._private.utils.DEFAULT, float] = DEFAULT.VALUE, health_check_period_s: Union[ray.serve._private.utils.DEFAULT, float] = DEFAULT.VALUE, health_check_timeout_s: Union[ray.serve._private.utils.DEFAULT, float] = DEFAULT.VALUE, is_driver_deployment: Optional[bool] = DEFAULT.VALUE) Callable[[Callable], ray.serve.deployment.Deployment]

Define a Serve deployment.

Parameters
  • name (Default[str]) – Globally-unique name identifying this deployment. If not provided, the name of the class or function will be used.

  • [DEPRECATED] (version) – Version of the deployment. This is used to indicate a code change for the deployment; when it is re-deployed with a version change, a rolling update of the replicas will be performed. If not provided, every deployment will be treated as a new version.

  • num_replicas (Default[Optional[int]]) – The number of processes to start up that will handle requests to this deployment. Defaults to 1.

  • init_args (Default[Tuple[Any]]) – Positional args to be passed to the class constructor when starting up deployment replicas. These can also be passed when you call deploy() on the returned Deployment.

  • init_kwargs (Default[Dict[Any, Any]]) – Keyword args to be passed to the class constructor when starting up deployment replicas. These can also be passed when you call deploy() on the returned Deployment.

  • route_prefix (Default[Union[str, None]]) – Requests to paths under this HTTP path prefix will be routed to this deployment. Defaults to ‘/{name}’. When set to ‘None’, no HTTP endpoint will be created. Routing is done based on longest-prefix match, so if you have deployment A with a prefix of ‘/a’ and deployment B with a prefix of ‘/a/b’, requests to ‘/a’, ‘/a/’, and ‘/a/c’ go to A and requests to ‘/a/b’, ‘/a/b/’, and ‘/a/b/c’ go to B. Routes must not end with a ‘/’ unless they’re the root (just ‘/’), which acts as a catch-all.

  • ray_actor_options (Default[Dict]) – Options to be passed to the Ray actor constructor such as resource requirements. Valid options are accelerator_type, memory, num_cpus, num_gpus, object_store_memory, resources, and runtime_env.

  • user_config (Default[Optional[Any]]) – Config to pass to the reconfigure method of the deployment. This can be updated dynamically without changing the version of the deployment and restarting its replicas. The user_config must be json-serializable to keep track of updates, so it must only contain json-serializable types, or json-serializable types nested in lists and dictionaries.

  • max_concurrent_queries (Default[int]) – The maximum number of queries that will be sent to a replica of this deployment without receiving a response. Defaults to 100.

  • is_driver_deployment (Optional[bool]) – [Experiment] when set it as True, serve will deploy exact one deployment to every node.

Example: >>> from ray import serve >>> @serve.deployment(name=”deployment1”) # doctest: +SKIP … class MyDeployment: # doctest: +SKIP … pass # doctest: +SKIP

>>> MyDeployment.bind(*init_args) 
>>> MyDeployment.options( 
...     num_replicas=2, init_args=init_args).bind()
Returns

Deployment

PublicAPI (beta): This API is in beta and may change before becoming stable.

ray.serve.shutdown() None[source]

Completely shut down the connected Serve instance.

Shuts down all processes and deletes all state associated with the instance.

PublicAPI: This API is stable across Ray releases.

Deployment API

class ray.serve.deployment.Deployment(func_or_class: Union[Callable, str], name: str, config: ray.serve.config.DeploymentConfig, version: Optional[str] = None, init_args: Optional[Tuple[Any]] = None, init_kwargs: Optional[Tuple[Any]] = None, route_prefix: Union[str, None, ray.serve._private.utils.DEFAULT] = DEFAULT.VALUE, ray_actor_options: Optional[Dict] = None, _internal=False, is_driver_deployment: Optional[bool] = False)[source]

PublicAPI: This API is stable across Ray releases.

bind(*args, **kwargs) Union[ray.dag.class_node.ClassNode, ray.dag.function_node.FunctionNode][source]

Bind the provided arguments and return a class or function node.

The returned bound deployment can be deployed or bound to other deployments to create a deployment graph.

PublicAPI (beta): This API is in beta and may change before becoming stable.

deploy(*init_args, _blocking=True, **init_kwargs)[source]

Deploy or update this deployment.

Parameters
  • init_args – args to pass to the class __init__ method. Not valid if this deployment wraps a function.

  • init_kwargs – kwargs to pass to the class __init__ method. Not valid if this deployment wraps a function.

Warning

DEPRECATED: This API is deprecated and may be removed in a future Ray release. See https://docs.ray.io/en/latest/serve/index.html for more information.

delete()[source]

Delete this deployment.

Warning

DEPRECATED: This API is deprecated and may be removed in a future Ray release. See https://docs.ray.io/en/latest/serve/index.html for more information.

get_handle(sync: Optional[bool] = True) Union[ray.serve.handle.RayServeHandle, ray.serve.handle.RayServeSyncHandle][source]

Get a ServeHandle to this deployment to invoke it from Python.

Parameters

sync – If true, then Serve will return a ServeHandle that works everywhere. Otherwise, Serve will return an asyncio-optimized ServeHandle that’s only usable in an asyncio loop.

Returns

ServeHandle

Warning

DEPRECATED: This API is deprecated and may be removed in a future Ray release. See https://docs.ray.io/en/latest/serve/index.html for more information.

options(func_or_class: Optional[Callable] = None, name: Union[ray.serve._private.utils.DEFAULT, str] = DEFAULT.VALUE, version: Union[ray.serve._private.utils.DEFAULT, str] = DEFAULT.VALUE, num_replicas: Optional[Union[ray.serve._private.utils.DEFAULT, int]] = DEFAULT.VALUE, init_args: Union[ray.serve._private.utils.DEFAULT, Tuple[Any]] = DEFAULT.VALUE, init_kwargs: Union[ray.serve._private.utils.DEFAULT, Dict[Any, Any]] = DEFAULT.VALUE, route_prefix: Optional[Union[ray.serve._private.utils.DEFAULT, str]] = DEFAULT.VALUE, ray_actor_options: Optional[Union[ray.serve._private.utils.DEFAULT, Dict]] = DEFAULT.VALUE, user_config: Optional[Union[ray.serve._private.utils.DEFAULT, Any]] = DEFAULT.VALUE, max_concurrent_queries: Union[ray.serve._private.utils.DEFAULT, int] = DEFAULT.VALUE, autoscaling_config: Optional[Union[ray.serve._private.utils.DEFAULT, Dict, ray.serve.config.AutoscalingConfig]] = DEFAULT.VALUE, graceful_shutdown_wait_loop_s: Union[ray.serve._private.utils.DEFAULT, float] = DEFAULT.VALUE, graceful_shutdown_timeout_s: Union[ray.serve._private.utils.DEFAULT, float] = DEFAULT.VALUE, health_check_period_s: Union[ray.serve._private.utils.DEFAULT, float] = DEFAULT.VALUE, health_check_timeout_s: Union[ray.serve._private.utils.DEFAULT, float] = DEFAULT.VALUE, is_driver_deployment: bool = DEFAULT.VALUE, _internal: bool = False) ray.serve.deployment.Deployment[source]

Return a copy of this deployment with updated options.

Only those options passed in will be updated, all others will remain unchanged from the existing deployment.

Parameters
  • non-private (Refer to @serve.deployment decorator docstring for all) –

  • arguments.

  • _internal – If True, this function won’t log deprecation warnings and won’t update this deployment’s config’s user_configured_option_names. It should only be True when used internally by Serve. It should be False when called by users.

PublicAPI: This API is stable across Ray releases.

ServeHandle API

class ray.serve.handle.RayServeHandle(controller_handle: ray.actor.ActorHandle, deployment_name: str, handle_options: Optional[ray.serve.handle.HandleOptions] = None, *, _router: Optional[ray.serve._private.router.Router] = None, _internal_pickled_http_request: bool = False)[source]

A handle to a service deployment.

Invoking this deployment with .remote is equivalent to pinging an HTTP deployment.

Example

>>> import ray
>>> serve_client = ... 
>>> handle = serve_client.get_handle("my_deployment") 
>>> handle 
RayServeSyncHandle(deployment_name="my_deployment")
>>> my_request_content = ... 
>>> handle.remote(my_request_content) 
ObjectRef(...)
>>> ray.get(handle.remote(...)) 
# result
>>> let_it_crash_request = ... 
>>> ray.get(handle.remote(let_it_crash_request)) 
# raises RayTaskError Exception
>>> async_handle = serve_client.get_handle( 
...     "my_deployment", sync=False)
>>> async_handle  
RayServeHandle(deployment="my_deployment")
>>> await async_handle.remote(my_request_content) 
ObjectRef(...)
>>> ray.get(await async_handle.remote(...)) 
# result
>>> ray.get( 
...     await async_handle.remote(let_it_crash_request)
... )
# raises RayTaskError Exception

PublicAPI (beta): This API is in beta and may change before becoming stable.

options(*, method_name: Union[str, ray.serve._private.utils.DEFAULT] = DEFAULT.VALUE)[source]

Set options for this handle.

Parameters

method_name – The method to invoke.

remote(*args, **kwargs)[source]

Issue an asynchronous request to the deployment.

Returns a Ray ObjectRef whose results can be waited for or retrieved using ray.wait or ray.get (or await object_ref), respectively.

Returns

ray.ObjectRef

Parameters
  • request_data (dict, Any) – If it’s a dictionary, the data will be available in request.json() or request.form(). Otherwise, it will be available in request.body().

  • **kwargs – All keyword arguments will be available in request.query_params.

Batching Requests

ray.serve.batch(func: ray.serve.batching.F) ray.serve.batching.G[source]
ray.serve.batch(max_batch_size: Optional[int] = 10, batch_wait_timeout_s: Optional[float] = 0.0) Callable[[ray.serve.batching.F], ray.serve.batching.G]

Converts a function to asynchronously handle batches.

The function can be a standalone function or a class method. In both cases, the function must be async def and take a list of objects as its sole argument and return a list of the same length as a result.

When invoked, the caller passes a single object. These will be batched and executed asynchronously once there is a batch of max_batch_size or batch_wait_timeout_s has elapsed, whichever occurs first.

Example: >>> from ray import serve >>> @serve.batch(max_batch_size=50, batch_wait_timeout_s=0.5) # doctest: +SKIP … async def handle_batch(batch: List[str]): # doctest: +SKIP … return [s.lower() for s in batch] # doctest: +SKIP

>>> async def handle_single(s: str): 
...     # Returns s.lower().
...     return await handle_batch(s) 
Parameters
  • max_batch_size – the maximum batch size that will be executed in one call to the underlying function.

  • batch_wait_timeout_s – the maximum duration to wait for max_batch_size elements before running the underlying function.

PublicAPI (beta): This API is in beta and may change before becoming stable.

Serve CLI and REST API

Check out the CLI and REST API for running, debugging, inspecting, and deploying Serve applications in production:

Deployment Graph APIs

ray.serve.api.build(target: Union[ray.dag.class_node.ClassNode, ray.dag.function_node.FunctionNode]) ray.serve.application.Application[source]

Builds a Serve application into a static application.

Takes in a ClassNode or FunctionNode and converts it to a Serve application consisting of one or more deployments. This is intended to be used for production scenarios and deployed via the Serve REST API or CLI, so there are some restrictions placed on the deployments: 1) All of the deployments must be importable. That is, they cannot be defined in __main__ or inline defined. The deployments will be imported in production using the same import path they were here. 2) All arguments bound to the deployment must be JSON-serializable.

The returned Application object can be exported to a dictionary or YAML config.

Parameters

target (Union[ClassNode, FunctionNode]) – A ClassNode or FunctionNode that acts as the top level node of the DAG.

Returns

The static built Serve application

PublicAPI (alpha): This API is in alpha and may change before becoming stable.