Serve REST API¶

REST API¶

GET /api/serve/deployments/ HTTP 1.1
Host: http://localhost:52365/
Accept: application/json

Example Response:

HTTP/1.1 200 OK
Content-Type: application/json

{
    "import_path": "fruit.deployment_graph",
    "runtime_env": {
        "working_dir": "https://github.com/ray-project/serve_config_examples/archive/HEAD.zip"
    },
    "deployments": [
        {"name": "MangoStand", "user_config": {"price": 1}},
        {"name": "OrangeStand", "user_config": {"price": 2}},
        {"name": "PearStand", "user_config": {"price": 3}}
    ]
}

PUT "/api/serve/deployments/"¶

Declaratively deploys the Serve application. Starts Serve on the Ray cluster if it’s not already running. See the config schema for the request’s JSON schema.

Example Request:

PUT /api/serve/deployments/ HTTP 1.1
Host: http://localhost:52365/
Accept: application/json

{
    "import_path": "fruit.deployment_graph",
    "runtime_env": {
        "working_dir": "https://github.com/ray-project/serve_config_examples/archive/HEAD.zip"
    },
    "deployments": [
        {"name": "MangoStand", "user_config": {"price": 1}},
        {"name": "OrangeStand", "user_config": {"price": 2}},
        {"name": "PearStand", "user_config": {"price": 3}}
    ]
}

Example Response

HTTP/1.1 200 OK
Content-Type: application/json

GET "/api/serve/deployments/status"¶

Gets the Serve application’s current status, including all the deployment statuses. This config represents the current goal state for the Serve application. Starts a Serve application on the Ray cluster if it’s not already running. See the status schema for the response’s JSON schema.

Example Request:

GET /api/serve/deployments/ HTTP 1.1
Host: http://localhost:52365/
Accept: application/json

Example Response

HTTP/1.1 200 OK
Content-Type: application/json

{
    "app_status": {
        "status": "RUNNING",
        "message": "",
        "deployment_timestamp": 1855994527.146304
    },
    "deployment_statuses": [
        {
            "name": "MangoStand",
            "status": "HEALTHY",
            "message": ""
        },
        {
            "name": "OrangeStand",
            "status": "HEALTHY",
            "message": ""
        },
        {
            "name": "PearStand",
            "status": "HEALTHY",
            "message": ""
        },
        {
            "name": "FruitMarket",
            "status": "HEALTHY",
            "message": ""
        },
        {
            "name": "DAGDriver",
            "status": "HEALTHY",
            "message": ""
        }
    ]
}

DELETE "/api/serve/deployments/"¶

Shuts down the Serve application running on the Ray cluster. Has no effect if Serve is not running on the Ray cluster.

Example Request:

DELETE /api/serve/deployments/ HTTP 1.1
Host: http://localhost:52365/
Accept: application/json

Example Response

HTTP/1.1 200 OK
Content-Type: application/json

Config Schema¶

pydantic model ray.serve.schema.ServeApplicationSchema[source]¶

PublicAPI (beta): This API is in beta and may change before becoming stable.

Show JSON schema
{
   "title": "ServeApplicationSchema",
   "description": "PublicAPI (beta): This API is in beta and may change before becoming stable.",
   "type": "object",
   "properties": {
      "import_path": {
         "title": "Import Path",
         "description": "An import path to a bound deployment node. Should be of the form \"module.submodule_1...submodule_n.dag_node\". This is equivalent to \"from module.submodule_1...submodule_n import dag_node\". Only works with Python applications. This field is REQUIRED when deploying Serve config to a Ray cluster.",
         "type": "string"
      },
      "runtime_env": {
         "title": "Runtime Env",
         "description": "The runtime_env that the deployment graph will be run in. Per-deployment runtime_envs will inherit from this. working_dir and py_modules may contain only remote URIs.",
         "default": {},
         "type": "object"
      },
      "deployments": {
         "title": "Deployments",
         "description": "Deployment options that override options specified in the code.",
         "default": [],
         "type": "array",
         "items": {
            "$ref": "#/definitions/DeploymentSchema"
         }
      }
   },
   "additionalProperties": false,
   "definitions": {
      "DEFAULT": {
         "title": "DEFAULT",
         "description": "An enumeration.",
         "enum": [
            1
         ]
      },
      "RayActorOptionsSchema": {
         "title": "RayActorOptionsSchema",
         "description": "PublicAPI (beta): This API is in beta and may change before becoming stable.",
         "type": "object",
         "properties": {
            "runtime_env": {
               "title": "Runtime Env",
               "description": "This deployment's runtime_env. working_dir and py_modules may contain only remote URIs.",
               "default": {},
               "type": "object"
            },
            "num_cpus": {
               "title": "Num Cpus",
               "description": "The number of CPUs required by the deployment's application per replica. This is the same as a ray actor's num_cpus. Uses a default if null.",
               "minimum": 0,
               "type": "number"
            },
            "num_gpus": {
               "title": "Num Gpus",
               "description": "The number of GPUs required by the deployment's application per replica. This is the same as a ray actor's num_gpus. Uses a default if null.",
               "minimum": 0,
               "type": "number"
            },
            "memory": {
               "title": "Memory",
               "description": "Restrict the heap memory usage of each replica. Uses a default if null.",
               "minimum": 0,
               "type": "number"
            },
            "object_store_memory": {
               "title": "Object Store Memory",
               "description": "Restrict the object store memory used per replica when creating objects. Uses a default if null.",
               "minimum": 0,
               "type": "number"
            },
            "resources": {
               "title": "Resources",
               "description": "The custom resources required by each replica.",
               "default": {},
               "type": "object"
            },
            "accelerator_type": {
               "title": "Accelerator Type",
               "description": "Forces replicas to run on nodes with the specified accelerator type.",
               "type": "string"
            }
         },
         "additionalProperties": false
      },
      "DeploymentSchema": {
         "title": "DeploymentSchema",
         "description": "PublicAPI (beta): This API is in beta and may change before becoming stable.",
         "type": "object",
         "properties": {
            "name": {
               "title": "Name",
               "description": "Globally-unique name identifying this deployment.",
               "type": "string"
            },
            "num_replicas": {
               "title": "Num Replicas",
               "description": "The number of processes that handle requests to this deployment. Uses a default if null.",
               "exclusiveMinimum": 0,
               "type": "integer"
            },
            "route_prefix": {
               "title": "Route Prefix",
               "description": "Requests to paths under this HTTP path prefix will be routed to this deployment. When null, no HTTP endpoint will be created. When omitted, defaults to the deployment's name. Routing is done based on longest-prefix match, so if you have deployment A with a prefix of \"/a\" and deployment B with a prefix of \"/a/b\", requests to \"/a\", \"/a/\", and \"/a/c\" go to A and requests to \"/a/b\", \"/a/b/\", and \"/a/b/c\" go to B. Routes must not end with a \"/\" unless they're the root (just \"/\"), which acts as a catch-all.",
               "default": 1,
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "$ref": "#/definitions/DEFAULT"
                  }
               ]
            },
            "max_concurrent_queries": {
               "title": "Max Concurrent Queries",
               "description": "The max number of pending queries in a single replica. Uses a default if null.",
               "exclusiveMinimum": 0,
               "type": "integer"
            },
            "user_config": {
               "title": "User Config",
               "description": "Config to pass into this deployment's reconfigure method. This can be updated dynamically without restarting replicas",
               "type": "object"
            },
            "autoscaling_config": {
               "title": "Autoscaling Config",
               "description": "Config specifying autoscaling parameters for the deployment's number of replicas. If null, the deployment won't autoscale its number of replicas; the number of replicas will be fixed at num_replicas.",
               "type": "object"
            },
            "graceful_shutdown_wait_loop_s": {
               "title": "Graceful Shutdown Wait Loop S",
               "description": "Duration that deployment replicas will wait until there is no more work to be done before shutting down. Uses a default if null.",
               "minimum": 0,
               "type": "number"
            },
            "graceful_shutdown_timeout_s": {
               "title": "Graceful Shutdown Timeout S",
               "description": "Serve controller waits for this duration before forcefully killing the replica for shutdown. Uses a default if null.",
               "minimum": 0,
               "type": "number"
            },
            "health_check_period_s": {
               "title": "Health Check Period S",
               "description": "Frequency at which the controller will health check replicas. Uses a default if null.",
               "exclusiveMinimum": 0,
               "type": "number"
            },
            "health_check_timeout_s": {
               "title": "Health Check Timeout S",
               "description": "Timeout that the controller will wait for a response from the replica's health check before marking it unhealthy. Uses a default if null.",
               "exclusiveMinimum": 0,
               "type": "number"
            },
            "ray_actor_options": {
               "title": "Ray Actor Options",
               "description": "Options set for each replica actor.",
               "allOf": [
                  {
                     "$ref": "#/definitions/RayActorOptionsSchema"
                  }
               ]
            }
         },
         "required": [
            "name"
         ],
         "additionalProperties": false
      }
   }
}

Fields
Validators
field deployments: List[ray.serve.schema.DeploymentSchema] = []¶

Deployment options that override options specified in the code.

field import_path: str = None¶

An import path to a bound deployment node. Should be of the form “module.submodule_1…submodule_n.dag_node”. This is equivalent to “from module.submodule_1…submodule_n import dag_node”. Only works with Python applications. This field is REQUIRED when deploying Serve config to a Ray cluster.

Validated by
field runtime_env: dict = {}¶

The runtime_env that the deployment graph will be run in. Per-deployment runtime_envs will inherit from this. working_dir and py_modules may contain only remote URIs.

Validated by
static get_empty_schema_dict() Dict[source]¶

Returns an empty app schema dictionary.

Schema can be used as a representation of an empty Serve config.

validator import_path_format_valid  »  import_path[source]¶
validator runtime_env_contains_remote_uris  »  runtime_env[source]¶

Status Schema¶

pydantic model ray.serve.schema.ServeStatusSchema[source]¶

PublicAPI (beta): This API is in beta and may change before becoming stable.

Show JSON schema
{
   "title": "ServeStatusSchema",
   "description": "PublicAPI (beta): This API is in beta and may change before becoming stable.",
   "type": "object",
   "properties": {
      "app_status": {
         "title": "App Status",
         "description": "Describes if the Serve application is DEPLOYING, if the DEPLOY_FAILED, or if the app is RUNNING. Includes a timestamp of when the application was deployed.",
         "allOf": [
            {
               "$ref": "#/definitions/ApplicationStatusInfo"
            }
         ]
      },
      "deployment_statuses": {
         "title": "Deployment Statuses",
         "description": "List of statuses for all the deployments running in this Serve application. Each status contains the deployment name, the deployment's status, and a message providing extra context on the status.",
         "default": [],
         "type": "array",
         "items": {
            "$ref": "#/definitions/DeploymentStatusInfo"
         }
      }
   },
   "required": [
      "app_status"
   ],
   "additionalProperties": false,
   "definitions": {
      "ApplicationStatus": {
         "title": "ApplicationStatus",
         "description": "An enumeration.",
         "enum": [
            "NOT_STARTED",
            "DEPLOYING",
            "RUNNING",
            "DEPLOY_FAILED"
         ],
         "type": "string"
      },
      "ApplicationStatusInfo": {
         "title": "ApplicationStatusInfo",
         "type": "object",
         "properties": {
            "status": {
               "$ref": "#/definitions/ApplicationStatus"
            },
            "message": {
               "title": "Message",
               "default": "",
               "type": "string"
            },
            "deployment_timestamp": {
               "title": "Deployment Timestamp",
               "default": 0,
               "type": "number"
            }
         },
         "required": [
            "status"
         ]
      },
      "DeploymentStatus": {
         "title": "DeploymentStatus",
         "description": "An enumeration.",
         "enum": [
            "UPDATING",
            "HEALTHY",
            "UNHEALTHY"
         ],
         "type": "string"
      },
      "DeploymentStatusInfo": {
         "title": "DeploymentStatusInfo",
         "type": "object",
         "properties": {
            "name": {
               "title": "Name",
               "type": "string"
            },
            "status": {
               "$ref": "#/definitions/DeploymentStatus"
            },
            "message": {
               "title": "Message",
               "default": "",
               "type": "string"
            }
         },
         "required": [
            "name",
            "status"
         ]
      }
   }
}

Fields
field app_status: ray.serve._private.common.ApplicationStatusInfo [Required]¶

Describes if the Serve application is DEPLOYING, if the DEPLOY_FAILED, or if the app is RUNNING. Includes a timestamp of when the application was deployed.

field deployment_statuses: List[ray.serve._private.common.DeploymentStatusInfo] = []¶

List of statuses for all the deployments running in this Serve application. Each status contains the deployment name, the deployment’s status, and a message providing extra context on the status.

static get_empty_schema_dict() Dict[source]¶

Returns an empty status schema dictionary.

Schema represents Serve status for a Ray cluster where Serve hasn’t started yet.