Handle Dependencies#

Add a runtime environment#

The import path (e.g., text_ml:app) must be importable by Serve at runtime. When running locally, this path might be in your current working directory. However, when running on a cluster you also need to make sure the path is importable. Build the code into the cluster’s container image (see Cluster Configuration for more details) or use a runtime_env with a remote URI that hosts the code in remote storage.

For an example, see the Text ML Models application on GitHub. You can use this config file to deploy the text summarization and translation application to your own Ray cluster even if you don’t have the code locally:

import_path: text_ml:app

runtime_env:
    working_dir: "https://github.com/ray-project/serve_config_examples/archive/HEAD.zip"
    pip:
      - torch
      - transformers

Note

You can also package a deployment graph into a standalone Python package that you can import using a PYTHONPATH to provide location independence on your local machine. However, the best practice is to use a runtime_env, to ensure consistency across all machines in your cluster.

Dependencies per deployment#

Ray Serve also supports serving deployments with different (and possibly conflicting) Python dependencies. For example, you can simultaneously serve one deployment that uses legacy Tensorflow 1 and another that uses Tensorflow 2.

This is supported on Mac OS and Linux using Ray’s Runtime environments feature. As with all other Ray actor options, pass the runtime environment in via ray_actor_options in your deployment. Be sure to first run pip install "ray[default]" to ensure the Runtime Environments feature is installed.

Example:

import requests
from starlette.requests import Request

from ray import serve
from ray.serve.handle import DeploymentHandle


@serve.deployment
class Ingress:
    def __init__(
        self, ver_25_handle: DeploymentHandle, ver_26_handle: DeploymentHandle
    ):
        self.ver_25_handle = ver_25_handle
        self.ver_26_handle = ver_26_handle

    async def __call__(self, request: Request):
        if request.query_params["version"] == "25":
            return await self.ver_25_handle.remote()
        else:
            return await self.ver_26_handle.remote()


@serve.deployment
def requests_version():
    return requests.__version__


ver_25 = requests_version.options(
    name="25",
    ray_actor_options={"runtime_env": {"pip": ["requests==2.25.1"]}},
).bind()
ver_26 = requests_version.options(
    name="26",
    ray_actor_options={"runtime_env": {"pip": ["requests==2.26.0"]}},
).bind()

app = Ingress.bind(ver_25, ver_26)
serve.run(app)

assert requests.get("http://127.0.0.1:8000/?version=25").text == "2.25.1"
assert requests.get("http://127.0.0.1:8000/?version=26").text == "2.26.0"

Tip

Avoid dynamically installing packages that install from source: these can be slow and use up all resources while installing, leading to problems with the Ray cluster. Consider precompiling such packages in a private repository or Docker image.

The dependencies required in the deployment may be different than the dependencies installed in the driver program (the one running Serve API calls). In this case, you should use a delayed import within the class to avoid importing unavailable packages in the driver. This applies even when not using runtime environments.

Example:

from ray import serve


@serve.deployment
class MyDeployment:
    def __call__(self, model_path):
        from my_module import my_model

        self.model = my_model.load(model_path)