Ray Serve API
Contents
Ray Serve API#
Python API#
Writing Applications#
Class (or function) decorated with the |
|
One or more deployments bound with arguments that can be deployed together. |
Deployment Decorators#
Decorator that converts a Python class to a |
|
Wrap a deployment class with a FastAPI application for HTTP request parsing. |
|
Converts a function to asynchronously handle batches. |
|
Wrap a callable or method used to load multiplexed models in a replica. |
Deployment Handles#
Note
Ray 2.7 introduces a new DeploymentHandle
API that will replace the existing RayServeHandle
and RayServeSyncHandle
APIs.
Existing code will continue to work, but you are encouraged to opt-in to the new API to avoid breakages in the future.
To opt into the new API, you can either use handle.options(use_new_handle_api=True)
on each handle or set it globally via environment variable: export RAY_SERVE_ENABLE_NEW_HANDLE_API=1
.
A handle used to make requests to a deployment at runtime. |
|
A future-like object wrapping the result of a unary deployment handle call. |
|
A future-like object wrapping the result of a streaming deployment handle call. |
|
A handle used to make requests from one deployment to another. |
|
A handle used to make requests to the ingress deployment of an application. |
Running Applications#
Start Serve on the cluster. |
|
Run an application and return a handle to its ingress deployment. |
|
Delete an application by its name. |
|
Get the status of Serve on the cluster. |
|
Completely shut down Serve on the cluster. |
Configurations#
Config for where to run proxies to receive ingress traffic to the cluster. |
|
gRPC options for the proxies. |
|
HTTP options for the proxies. |
|
PublicAPI: This API is stable across Ray releases. |
Advanced APIs#
Returns the deployment and replica tag from within a replica at runtime. |
|
Stores runtime context info for replicas. |
|
Get the multiplexed model ID for the current request. |
|
Get a handle to the application's ingress deployment by name. |
|
Get a handle to a deployment by name. |
Command Line Interface (CLI)#
serve#
CLI for managing Serve applications on a Ray cluster.
serve [OPTIONS] COMMAND [ARGS]...
build#
Imports the applications at IMPORT_PATHS and generates a structured, multi-application config for them. If the flag –single-app is set, accepts one application and generates a single-application config. Config outputted from this command can be used by serve deploy
or the REST API.
serve build [OPTIONS] IMPORT_PATHS...
Options
- -d, --app-dir <app_dir>#
Local directory to look for the IMPORT_PATH (will be inserted into PYTHONPATH). Defaults to ‘.’, meaning that an object in ./main.py can be imported as ‘main.object’. Not relevant if you’re importing from an installed module.
- -k, --kubernetes_format#
Print a single-application Serve config in Kubernetes format. Must be used with the flag
--single-app
.
- -o, --output-path <output_path>#
Local path where the output config will be written in YAML format. If not provided, the config will be printed to STDOUT.
- --multi-app#
Generate a multi-application config from multiple targets.
- --single-app#
Generate a single-application config from one target.
- --grpc-servicer-functions <grpc_servicer_functions>#
Servicer function for adding the method handler to the gRPC server. Defaults to an empty list and no gRPC server is started.
Arguments
- IMPORT_PATHS#
Required argument(s)
config#
Gets the current config(s) of Serve application(s) on the cluster.
serve config [OPTIONS]
Options
- -a, --address <address>#
Address to use to query the Ray dashboard agent (defaults to http://localhost:52365). Can also be specified using the RAY_AGENT_ADDRESS environment variable.
- -n, --name <name>#
Name of an application. Only applies to multi-application mode. If set, this will only fetch the config for the specified application.
deploy#
This supports both configs of the format ServeApplicationSchema, which deploys a single application, as well as ServeDeploySchema, which deploys multiple applications.
This call is async; a successful response only indicates that the request was sent to the Ray cluster successfully. It does not mean the the deployments have been deployed/updated.
Existing deployments with no code changes will not be redeployed.
Use serve config
to fetch the current config(s) and serve status
to check the status of the application(s) and deployments after deploying.
serve deploy [OPTIONS] CONFIG_FILE_NAME
Options
- -a, --address <address>#
Address to use to query the Ray dashboard agent (defaults to http://localhost:52365). Can also be specified using the RAY_AGENT_ADDRESS environment variable.
Arguments
- CONFIG_FILE_NAME#
Required argument
run#
Runs an application from the specified import path (e.g., my_script:app) or application(s) from a YAML config.
If passing an import path, it must point to a Serve Application or a function that returns one. If a function is used, arguments can be passed to it in ‘key=val’ format after the import path, for example:
serve run my_script:app model_path=’/path/to/model.pkl’ num_replicas=5
If passing a YAML config, existing applications with no code changes will not be updated.
By default, this will block and stream logs to the console. If you Ctrl-C the command, it will shut down Serve on the cluster.
serve run [OPTIONS] CONFIG_OR_IMPORT_PATH [ARGUMENTS]...
Options
- --runtime-env <runtime_env>#
Path to a local YAML file containing a runtime_env definition. This will be passed to ray.init() as the default for deployments.
- --runtime-env-json <runtime_env_json>#
JSON-serialized runtime_env dictionary. This will be passed to ray.init() as the default for deployments.
- --working-dir <working_dir>#
Directory containing files that your application(s) will run in. Can be a local directory or a remote URI to a .zip file (S3, GS, HTTP). This overrides the working_dir in –runtime-env if both are specified. This will be passed to ray.init() as the default for deployments.
- -d, --app-dir <app_dir>#
Local directory to look for the IMPORT_PATH (will be inserted into PYTHONPATH). Defaults to ‘.’, meaning that an object in ./main.py can be imported as ‘main.object’. Not relevant if you’re importing from an installed module.
- -a, --address <address>#
Address to use for ray.init(). Can also be specified using the RAY_ADDRESS environment variable.
- -h, --host <host>#
Host for HTTP server to listen on. Defaults to 127.0.0.1.
- -p, --port <port>#
Port for HTTP proxies to listen on. Defaults to 8000.
- --blocking, --non-blocking#
Whether or not this command should be blocking. If blocking, it will loop and log status until Ctrl-C’d, then clean up the app.
- --gradio#
Whether to enable gradio visualization of deployment graph. The visualization can only be used with deployment graphs with DAGDriver as the ingress deployment.
- -r, --reload#
Listens for changes to files in the working directory, –working-dir or the working_dir in the –runtime-env, and automatically redeploys the application. This will block until Ctrl-C’d, then clean up the app.
Arguments
- CONFIG_OR_IMPORT_PATH#
Required argument
- ARGUMENTS#
Optional argument(s)
shutdown#
Shuts down Serve on the cluster, deleting all applications.
serve shutdown [OPTIONS]
Options
- -a, --address <address>#
Address to use to query the Ray dashboard agent (defaults to http://localhost:52365). Can also be specified using the RAY_AGENT_ADDRESS environment variable.
- -y, --yes#
Bypass confirmation prompt.
start#
Start Serve on the Ray cluster.
serve start [OPTIONS]
Options
- -a, --address <address>#
Address to use for ray.init(). Can also be specified using the RAY_ADDRESS environment variable.
- --http-host <http_host>#
Host for HTTP proxies to listen on. Defaults to 127.0.0.1.
- --http-port <http_port>#
Port for HTTP proxies to listen on. Defaults to 8000.
- --http-location <http_location>#
DEPRECATED: Use
--proxy-location
instead.- Options
DeploymentMode.NoServer | DeploymentMode.HeadOnly | DeploymentMode.EveryNode | DeploymentMode.FixedNumber
- --proxy-location <proxy_location>#
Location of the proxies. Defaults to EveryNode.
- Options
ProxyLocation.Disabled | ProxyLocation.HeadOnly | ProxyLocation.EveryNode
- --grpc-port <grpc_port>#
Port for gRPC proxies to listen on. Defaults to 9000.
- --grpc-servicer-functions <grpc_servicer_functions>#
Servicer function for adding the method handler to the gRPC server. Defaults to an empty list and no gRPC server is started.
status#
Prints status information about all applications on the cluster.
An application may be:
NOT_STARTED: the application does not exist.
DEPLOYING: the deployments in the application are still deploying and haven’t reached the target number of replicas.
RUNNING: all deployments are healthy.
DEPLOY_FAILED: the application failed to deploy or reach a running state.
DELETING: the application is being deleted, and the deployments in the application are being teared down.
The deployments within each application may be:
HEALTHY: all replicas are acting normally and passing their health checks.
UNHEALTHY: at least one replica is not acting normally and may not be passing its health check.
UPDATING: the deployment is updating.
serve status [OPTIONS]
Options
- -a, --address <address>#
Address to use to query the Ray dashboard agent (defaults to http://localhost:52365). Can also be specified using the RAY_AGENT_ADDRESS environment variable.
- -n, --name <name>#
Name of an application. If set, this will display only the status of the specified application.
Serve REST API#
V1 REST API (Single-application)#
PUT "/api/serve/deployments/"
#
Declaratively deploys the Serve application. Starts Serve on the Ray cluster if it’s not already running. See single-app config schema for the request’s JSON schema.
Example Request:
PUT /api/serve/deployments/ HTTP/1.1
Host: http://localhost:52365/
Accept: application/json
Content-Type: application/json
{
"import_path": "text_ml:app",
"runtime_env": {
"working_dir": "https://github.com/ray-project/serve_config_examples/archive/HEAD.zip"
},
"deployments": [
{"name": "Translator", "user_config": {"language": "french"}},
{"name": "Summarizer"},
]
}
Example Response
HTTP/1.1 200 OK
Content-Type: application/json
GET "/api/serve/deployments/"
#
Gets the config for the application currently deployed on the Ray cluster. This config represents the current goal state for the Serve application. See single-app config schema for the response’s JSON schema.
Example Request:
GET /api/serve/deployments/ HTTP/1.1
Host: http://localhost:52365/
Accept: application/json
Example Response:
HTTP/1.1 200 OK
Content-Type: application/json
{
"import_path": "text_ml:app",
"runtime_env": {
"working_dir": "https://github.com/ray-project/serve_config_examples/archive/HEAD.zip"
},
"deployments": [
{"name": "Translator", "user_config": {"language": "french"}},
{"name": "Summarizer"},
]
}
GET "/api/serve/deployments/status"
#
Gets the Serve application’s current status, including all the deployment statuses. See status schema for the response’s JSON schema.
Example Request:
GET /api/serve/deployments/status HTTP/1.1
Host: http://localhost:52365/
Accept: application/json
Example Response
HTTP/1.1 200 OK
Content-Type: application/json
{
"name": "default",
"app_status": {
"status": "RUNNING",
"message": "",
"deployment_timestamp": 1694043082.0397763
},
"deployment_statuses": [
{
"name": "Translator",
"status": "HEALTHY",
"message": ""
},
{
"name": "Summarizer",
"status": "HEALTHY",
"message": ""
}
]
}
DELETE "/api/serve/deployments/"
#
Shuts down Serve and the Serve application running on the Ray cluster. Has no effect if Serve is not running on the Ray cluster.
Example Request:
DELETE /api/serve/deployments/ HTTP/1.1
Host: http://localhost:52365/
Accept: application/json
Example Response
HTTP/1.1 200 OK
Content-Type: application/json
V2 REST API (Multi-application)#
PUT "/api/serve/applications/"
#
Declaratively deploys a list of Serve applications. If Serve is already running on the Ray cluster, removes all applications not listed in the new config. If Serve is not running on the Ray cluster, starts Serve. See multi-app config schema for the request’s JSON schema.
Example Request:
PUT /api/serve/applications/ HTTP/1.1
Host: http://localhost:52365/
Accept: application/json
Content-Type: application/json
{
"applications": [
{
"name": "text_app",
"route_prefix": "/",
"import_path": "text_ml:app",
"runtime_env": {
"working_dir": "https://github.com/ray-project/serve_config_examples/archive/HEAD.zip"
},
"deployments": [
{"name": "Translator", "user_config": {"language": "french"}},
{"name": "Summarizer"},
]
},
]
}
Example Response
HTTP/1.1 200 OK
Content-Type: application/json
GET "/api/serve/applications/"
#
Gets cluster-level info and comprehensive details on all Serve applications deployed on the Ray cluster. See metadata schema for the response’s JSON schema.
GET /api/serve/applications/ HTTP/1.1
Host: http://localhost:52365/
Accept: application/json
Example Response (abridged JSON):
HTTP/1.1 200 OK
Content-Type: application/json
{
"controller_info": {
"node_id": "cef533a072b0f03bf92a6b98cb4eb9153b7b7c7b7f15954feb2f38ec",
"node_ip": "10.0.29.214",
"actor_id": "1d214b7bdf07446ea0ed9d7001000000",
"actor_name": "SERVE_CONTROLLER_ACTOR",
"worker_id": "adf416ae436a806ca302d4712e0df163245aba7ab835b0e0f4d85819",
"log_file_path": "/serve/controller_29778.log"
},
"proxy_location": "EveryNode",
"http_options": {
"host": "0.0.0.0",
"port": 8000,
"root_path": "",
"request_timeout_s": null,
"keep_alive_timeout_s": 5
},
"grpc_options": {
"port": 9000,
"grpc_servicer_functions": []
},
"proxies": {
"cef533a072b0f03bf92a6b98cb4eb9153b7b7c7b7f15954feb2f38ec": {
"node_id": "cef533a072b0f03bf92a6b98cb4eb9153b7b7c7b7f15954feb2f38ec",
"node_ip": "10.0.29.214",
"actor_id": "b7a16b8342e1ced620ae638901000000",
"actor_name": "SERVE_CONTROLLER_ACTOR:SERVE_PROXY_ACTOR-cef533a072b0f03bf92a6b98cb4eb9153b7b7c7b7f15954feb2f38ec",
"worker_id": "206b7fe05b65fac7fdceec3c9af1da5bee82b0e1dbb97f8bf732d530",
"log_file_path": "/serve/http_proxy_10.0.29.214.log",
"status": "HEALTHY"
}
},
"deploy_mode": "MULTI_APP",
"applications": {
"app1": {
"name": "app1",
"route_prefix": "/",
"docs_path": null,
"status": "RUNNING",
"message": "",
"last_deployed_time_s": 1694042836.1912267,
"deployed_app_config": {
"name": "app1",
"route_prefix": "/",
"import_path": "src.text-test:app",
"deployments": [
{
"name": "Translator",
"num_replicas": 1,
"user_config": {
"language": "german"
}
}
]
},
"deployments": {
"Translator": {
"name": "Translator",
"status": "HEALTHY",
"message": "",
"deployment_config": {
"name": "Translator",
"num_replicas": 1,
"max_concurrent_queries": 100,
"user_config": {
"language": "german"
},
"graceful_shutdown_wait_loop_s": 2.0,
"graceful_shutdown_timeout_s": 20.0,
"health_check_period_s": 10.0,
"health_check_timeout_s": 30.0,
"ray_actor_options": {
"runtime_env": {
"env_vars": {}
},
"num_cpus": 1.0
},
"is_driver_deployment": false
},
"replicas": [
{
"node_id": "cef533a072b0f03bf92a6b98cb4eb9153b7b7c7b7f15954feb2f38ec",
"node_ip": "10.0.29.214",
"actor_id": "4bb8479ad0c9e9087fee651901000000",
"actor_name": "SERVE_REPLICA::app1#Translator#oMhRlb",
"worker_id": "1624afa1822b62108ead72443ce72ef3c0f280f3075b89dd5c5d5e5f",
"log_file_path": "/serve/deployment_Translator_app1#Translator#oMhRlb.log",
"replica_id": "app1#Translator#oMhRlb",
"state": "RUNNING",
"pid": 29892,
"start_time_s": 1694042840.577496
}
]
},
"Summarizer": {
"name": "Summarizer",
"status": "HEALTHY",
"message": "",
"deployment_config": {
"name": "Summarizer",
"num_replicas": 1,
"max_concurrent_queries": 100,
"user_config": null,
"graceful_shutdown_wait_loop_s": 2.0,
"graceful_shutdown_timeout_s": 20.0,
"health_check_period_s": 10.0,
"health_check_timeout_s": 30.0,
"ray_actor_options": {
"runtime_env": {},
"num_cpus": 1.0
},
"is_driver_deployment": false
},
"replicas": [
{
"node_id": "cef533a072b0f03bf92a6b98cb4eb9153b7b7c7b7f15954feb2f38ec",
"node_ip": "10.0.29.214",
"actor_id": "7118ae807cffc1c99ad5ad2701000000",
"actor_name": "SERVE_REPLICA::app1#Summarizer#cwiPXg",
"worker_id": "12de2ac83c18ce4a61a443a1f3308294caf5a586f9aa320b29deed92",
"log_file_path": "/serve/deployment_Summarizer_app1#Summarizer#cwiPXg.log",
"replica_id": "app1#Summarizer#cwiPXg",
"state": "RUNNING",
"pid": 29893,
"start_time_s": 1694042840.5789504
}
]
}
}
}
}
}
DELETE "/api/serve/applications/"
#
Shuts down Serve and all applications running on the Ray cluster. Has no effect if Serve is not running on the Ray cluster.
Example Request:
DELETE /api/serve/applications/ HTTP/1.1
Host: http://localhost:52365/
Accept: application/json
Example Response
HTTP/1.1 200 OK
Content-Type: application/json
Config Schemas#
Multi-application config for deploying a list of Serve applications to the Ray cluster. |
|
Options to start the gRPC Proxy with. |
|
Options to start the HTTP Proxy with. |
|
Describes one Serve application, and currently can also be used as a standalone config to deploy a single application to a Ray cluster. |
|
Specifies options for one deployment within a Serve application. |
|
Options with which to start a replica actor. |
Response Schemas#
V1 REST API#
Describes the status of an application and all its deployments. |
V2 REST API#
Serve metadata with system-level info and details on all applications deployed to the Ray cluster. |
|
Detailed info about a Serve application. |
|
Detailed info about a deployment within a Serve application. |
|
Detailed info about a single deployment replica. |
Metrics API#
|
A serve cumulative metric that is monotonically increasing. |
|
Tracks the size and number of events in buckets. |
|
Gauges keep the last recorded value and drop everything before. |