Package Reference

Basic APIs

ray.serve.init(name=None, http_host='127.0.0.1', http_port=8000, metric_exporter=<class 'ray.serve.metric.exporter.InMemoryExporter'>)[source]

Initialize or connect to a serve cluster.

If serve cluster is already initialized, this function will just return.

If ray.init has not been called in this process, it will be called with no arguments. To specify kwargs to ray.init, it should be called separately before calling serve.init.

Parameters
  • name (str) – A unique name for this serve instance. This allows multiple serve instances to run on the same ray cluster. Must be specified in all subsequent serve.init() calls.

  • http_host (str) – Host for HTTP server. Default to “0.0.0.0”.

  • http_port (int) – Port for HTTP server. Default to 8000.

  • metric_exporter (ExporterInterface) – The class aggregates metrics from all RayServe actors and optionally export them to external services. RayServe has two options built in: InMemoryExporter and PrometheusExporter

ray.serve.shutdown()[source]

Completely shut down the connected Serve instance.

Shuts down all processes and deletes all state associated with the Serve instance that’s currently connected to (via serve.init).

ray.serve.create_backend(backend_tag, func_or_class, *actor_init_args, ray_actor_options=None, config=None)[source]

Create a backend with the provided tag.

The backend will serve requests with func_or_class.

Parameters
  • backend_tag (str) – a unique tag assign to identify this backend.

  • func_or_class (callable, class) – a function or a class implementing __call__.

  • actor_init_args (optional) – the arguments to pass to the class. initialization method.

  • ray_actor_options (optional) – options to be passed into the @ray.remote decorator for the backend actor.

  • config (optional) –

    configuration options for this backend. Supported options:

    • ”num_replicas”: number of worker processes to start up that will handle requests to this backend.

    • ”max_batch_size”: the maximum number of requests that will be processed in one batch by this backend.

    • ”batch_wait_timeout”: time in seconds that backend replicas will wait for a full batch of requests before processing a partial batch.

    • ”max_concurrent_queries”: the maximum number of queries that will be sent to a replica of this backend without receiving a response.

ray.serve.create_endpoint(endpoint_name, *, backend=None, route=None, methods=['GET'])[source]

Create a service endpoint given route_expression.

Parameters
  • endpoint_name (str) – A name to associate to with the endpoint.

  • backend (str, required) – The backend that will serve requests to this endpoint. To change this or split traffic among backends, use serve.set_traffic.

  • route (str, optional) – A string begin with “/”. HTTP server will use the string to match the path.

  • methods (List[str], optional) – The HTTP methods that are valid for this endpoint.

APIs for Managing Endpoints

ray.serve.list_endpoints()[source]

Returns a dictionary of all registered endpoints.

The dictionary keys are endpoint names and values are dictionaries of the form: {“methods”: List[str], “traffic”: Dict[str, float]}.

ray.serve.delete_endpoint(endpoint)[source]

Delete the given endpoint.

Does not delete any associated backends.

ray.serve.set_traffic(endpoint_name, traffic_policy_dictionary)[source]

Associate a service endpoint with traffic policy.

Example:

>>> serve.set_traffic("service-name", {
    "backend:v1": 0.5,
    "backend:v2": 0.5
})
Parameters
  • endpoint_name (str) – A registered service endpoint.

  • traffic_policy_dictionary (dict) – a dictionary maps backend names to their traffic weights. The weights must sum to 1.

ray.serve.shadow_traffic(endpoint_name, backend_tag, proportion)[source]

Shadow traffic from an endpoint to a backend.

The specified proportion of requests will be duplicated and sent to the backend. Responses of the duplicated traffic will be ignored. The backend must not already be in use.

To stop shadowing traffic to a backend, call shadow_traffic with proportion equal to 0.

Parameters
  • endpoint_name (str) – A registered service endpoint.

  • backend_tag (str) – A registered backend.

  • proportion (float) – The proportion of traffic from 0 to 1.

APIs for Managing Backends

ray.serve.list_backends()[source]

Returns a dictionary of all registered backends.

Dictionary maps backend tags to backend configs.

ray.serve.delete_backend(backend_tag)[source]

Delete the given backend.

The backend must not currently be used by any endpoints.

ray.serve.get_backend_config(backend_tag)[source]

Get the backend configuration for a backend tag.

Parameters

backend_tag (str) – A registered backend.

ray.serve.update_backend_config(backend_tag, config_options)[source]

Update a backend configuration for a backend tag.

Keys not specified in the passed will be left unchanged.

Parameters
  • backend_tag (str) – A registered backend.

  • config_options (dict) –

    Backend config options to update. Supported options:

    • ”num_replicas”: number of worker processes to start up that will handle requests to this backend.

    • ”max_batch_size”: the maximum number of requests that will be processed in one batch by this backend.

    • ”batch_wait_timeout”: time in seconds that backend replicas will wait for a full batch of requests before processing a partial batch.

    • ”max_concurrent_queries”: the maximum number of queries that will be sent to a replica of this backend without receiving a response.

Advanced APIs

serve.get_handle enables calling endpoints from Python.

ray.serve.get_handle(endpoint_name, relative_slo_ms=None, absolute_slo_ms=None, missing_ok=False)[source]

Retrieve RayServeHandle for service endpoint to invoke it from Python.

Parameters
  • endpoint_name (str) – A registered service endpoint.

  • relative_slo_ms (float) – Specify relative deadline in milliseconds for queries fired using this handle. (Default: None)

  • absolute_slo_ms (float) – Specify absolute deadline in milliseconds for queries fired using this handle. (Default: None)

  • missing_ok (bool) – If true, skip the check for the endpoint existence. It can be useful when the endpoint has not been registered.

Returns

RayServeHandle

class ray.serve.handle.RayServeHandle(router_handle, endpoint_name, relative_slo_ms=None, absolute_slo_ms=None, method_name=None, shard_key=None)[source]

A handle to a service endpoint.

Invoking this endpoint with .remote is equivalent to pinging an HTTP endpoint.

Example

>>> handle = serve.get_handle("my_endpoint")
>>> handle
RayServeHandle(
     Endpoint="my_endpoint",
     URL="...",
     Traffic=...
)
>>> handle.remote(my_request_content)
ObjectID(...)
>>> ray.get(handle.remote(...))
# result
>>> ray.get(handle.remote(let_it_crash_request))
# raises RayTaskError Exception

serve.stat queries Ray Serve’s built-in metric monitor. .. autofunction:: ray.serve.stat

serve.accept_batch marks your backend API does accept list of input instead of just single input. .. autofunction:: ray.serve.accept_batch