Serving ML Models (Tensorflow, PyTorch, Scikit-Learn, others)#

In this guide, we will show you how to train models from various machine learning frameworks and deploy them to Ray Serve.

Please see the Key Concepts to learn more general information about Ray Serve.

Let’s train and deploy a simple Tensorflow neural net. In particular, we will show:

  • How to train a Tensorflow model and load the model from your file system in your Ray Serve deployment.

  • How to parse the JSON request and make a prediction.

Ray Serve is framework-agnostic – you can use any version of Tensorflow. However, for this tutorial, we will use Tensorflow 2 and Keras. We will also need requests to send HTTP requests to your model deployment. If you haven’t already, please install Tensorflow 2 and requests by running:

$ pip install "tensorflow>=2.0" requests

Open a new Python file called tutorial_tensorflow.py. First, let’s import Ray Serve and some other helpers.

from ray import serve

import os
import tempfile
import numpy as np
from starlette.requests import Request
from typing import Dict

import tensorflow as tf

Next, let’s train a simple MNIST model using Keras.

TRAINED_MODEL_PATH = os.path.join(tempfile.gettempdir(), "mnist_model.h5")


def train_and_save_model():
    # Load mnist dataset
    mnist = tf.keras.datasets.mnist
    (x_train, y_train), (x_test, y_test) = mnist.load_data()
    x_train, x_test = x_train / 255.0, x_test / 255.0

    # Train a simple neural net model
    model = tf.keras.models.Sequential(
        [
            tf.keras.layers.Flatten(input_shape=(28, 28)),
            tf.keras.layers.Dense(128, activation="relu"),
            tf.keras.layers.Dropout(0.2),
            tf.keras.layers.Dense(10),
        ]
    )
    loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
    model.compile(optimizer="adam", loss=loss_fn, metrics=["accuracy"])
    model.fit(x_train, y_train, epochs=1)

    model.evaluate(x_test, y_test, verbose=2)
    model.summary()

    # Save the model in h5 format in local file system
    model.save(TRAINED_MODEL_PATH)


if not os.path.exists(TRAINED_MODEL_PATH):
    train_and_save_model()

Next, we define a class TFMnistModel that will accept HTTP requests and run the MNIST model that we trained. It is decorated with @serve.deployment to make it a deployment object, so it can be deployed onto Ray Serve. Note that the Serve deployment is exposed over an HTTP route, and by default the __call__ method is invoked when a request is sent to your deployment over HTTP.

@serve.deployment
class TFMnistModel:
    def __init__(self, model_path: str):
        import tensorflow as tf

        self.model_path = model_path
        self.model = tf.keras.models.load_model(model_path)

    async def __call__(self, starlette_request: Request) -> Dict:
        # Step 1: transform HTTP request -> tensorflow input
        # Here we define the request schema to be a json array.
        input_array = np.array((await starlette_request.json())["array"])
        reshaped_array = input_array.reshape((1, 28, 28))

        # Step 2: tensorflow input -> tensorflow output
        prediction = self.model(reshaped_array)

        # Step 3: tensorflow output -> web output
        return {"prediction": prediction.numpy().tolist(), "file": self.model_path}

Note

When TFMnistModel is deployed and instantiated, it will load the Tensorflow model from your file system so that it can be ready to run inference on the model and serve requests later.

Now that we’ve defined our Serve deployment, let’s prepare it so that it can be deployed.

mnist_model = TFMnistModel.bind(TRAINED_MODEL_PATH)

Note

TFMnistModel.bind(TRAINED_MODEL_PATH) binds the argument TRAINED_MODEL_PATH to our deployment and returns a DeploymentNode object (wrapping an TFMnistModel deployment object) that can then be used to connect with other DeploymentNodes to form a more complex deployment graph.

Finally, we can deploy our model to Ray Serve through the terminal.

$ serve run tutorial_tensorflow:mnist_model

Note

If you see the following error:

TypeError: Descriptors cannot not be created directly.
    If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0.
    If you cannot immediately regenerate your protos, some other possible workarounds are:
     1. Downgrade the protobuf package to 3.20.x or lower.
     2. Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower).

You can downgrade the protobuf package to 3.20.x or lower in your Docker image, or tell Ray to do it at runtime by specifying a runtime environment:

Open a new YAML file called tf_env.yaml for runtime environment.

pip:
 - protobuf==3.20.3

Then, run the following command to deploy the model with the runtime environment.

$ serve run --runtime-env tf_env.yaml tutorial_tensorflow:mnist_model

Let’s query it! While Serve is running, open a separate terminal window, and run the following in an interactive Python shell or a separate Python script:

import requests
import numpy as np

resp = requests.get(
    "http://localhost:8000/", json={"array": np.random.randn(28 * 28).tolist()}
)
print(resp.json())

You should get an output like the following (the exact prediction may vary):

{
 "prediction": [[-1.504277229309082, ..., -6.793371200561523]],
 "file": "/tmp/mnist_model.h5"
}

Let’s load and deploy a PyTorch Resnet Model. In particular, we will show:

  • How to load the model from PyTorch’s pre-trained modelzoo.

  • How to parse the JSON request, transform the payload and make a prediction.

This tutorial will require PyTorch and Torchvision. Ray Serve is framework agnostic and works with any version of PyTorch. We will also need requests to send HTTP requests to your model deployment. If you haven’t already, please install them by running:

$ pip install torch torchvision requests

Open a new Python file called tutorial_pytorch.py. First, let’s import Ray Serve and some other helpers.

from ray import serve

from io import BytesIO
from PIL import Image
from starlette.requests import Request
from typing import Dict

import torch
from torchvision import transforms
from torchvision.models import resnet18

We define a class ImageModel that parses the input data, transforms the images, and runs the ResNet18 model loaded from torchvision. It is decorated with @serve.deployment to make it a deployment object so it can be deployed onto Ray Serve. Note that the Serve deployment is exposed over an HTTP route, and by default the __call__ method is invoked when a request is sent to your deployment over HTTP.

@serve.deployment
class ImageModel:
    def __init__(self):
        self.model = resnet18(pretrained=True).eval()
        self.preprocessor = transforms.Compose(
            [
                transforms.Resize(224),
                transforms.CenterCrop(224),
                transforms.ToTensor(),
                transforms.Lambda(lambda t: t[:3, ...]),  # remove alpha channel
                transforms.Normalize(
                    mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]
                ),
            ]
        )

    async def __call__(self, starlette_request: Request) -> Dict:
        image_payload_bytes = await starlette_request.body()
        pil_image = Image.open(BytesIO(image_payload_bytes))
        print("[1/3] Parsed image data: {}".format(pil_image))

        pil_images = [pil_image]  # Our current batch size is one
        input_tensor = torch.cat(
            [self.preprocessor(i).unsqueeze(0) for i in pil_images]
        )
        print("[2/3] Images transformed, tensor shape {}".format(input_tensor.shape))

        with torch.no_grad():
            output_tensor = self.model(input_tensor)
        print("[3/3] Inference done!")
        return {"class_index": int(torch.argmax(output_tensor[0]))}

Note

When ImageModel is deployed and instantiated, it will load the resnet18 model from torchvision so that it can be ready to run inference on the model and serve requests later.

Now that we’ve defined our Serve deployment, let’s prepare it so that it can be deployed.

image_model = ImageModel.bind()

Note

ImageModel.bind() returns a DeploymentNode object (wrapping an ImageModel deployment object) that can then be used to connect with other DeploymentNodes to form a more complex deployment graph.

Finally, we can deploy our model to Ray Serve through the terminal.

$ serve run tutorial_pytorch:image_model

Let’s query it! While Serve is running, open a separate terminal window, and run the following in an interactive Python shell or a separate Python script:

import requests

ray_logo_bytes = requests.get(
    "https://raw.githubusercontent.com/ray-project/"
    "ray/master/doc/source/images/ray_header_logo.png"
).content

resp = requests.post("http://localhost:8000/", data=ray_logo_bytes)
print(resp.json())

You should get an output like the following (the exact number may vary):

{'class_index': 919}

Let’s train and deploy a simple Scikit-Learn classifier. In particular, we will show:

  • How to load the Scikit-Learn model from file system in your Ray Serve definition.

  • How to parse the JSON request and make a prediction.

Ray Serve is framework-agnostic. You can use any version of sklearn. We will also need requests to send HTTP requests to your model deployment. If you haven’t already, please install scikit-learn and requests by running:

$ pip install scikit-learn requests

Open a new Python file called tutorial_sklearn.py. Let’s import Ray Serve and some other helpers.

from ray import serve

import pickle
import json
import numpy as np
import os
import tempfile
from starlette.requests import Request
from typing import Dict

from sklearn.datasets import load_iris
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.metrics import mean_squared_error

Train a Classifier

We will train a classifier with the iris dataset.

First, let’s instantiate a GradientBoostingClassifier loaded from Scikit-Learn.

model = GradientBoostingClassifier()

Next, load the iris dataset and split the data into training and validation sets.

iris_dataset = load_iris()
data, target, target_names = (
    iris_dataset["data"],
    iris_dataset["target"],
    iris_dataset["target_names"],
)

np.random.shuffle(data), np.random.shuffle(target)
train_x, train_y = data[:100], target[:100]
val_x, val_y = data[100:], target[100:]

We then train the model and save it to file.

model.fit(train_x, train_y)
print("MSE:", mean_squared_error(model.predict(val_x), val_y))

# Save the model and label to file
MODEL_PATH = os.path.join(
    tempfile.gettempdir(), "iris_model_gradient_boosting_classifier.pkl"
)
LABEL_PATH = os.path.join(tempfile.gettempdir(), "iris_labels.json")

with open(MODEL_PATH, "wb") as f:
    pickle.dump(model, f)
with open(LABEL_PATH, "w") as f:
    json.dump(target_names.tolist(), f)

Deploy with Ray Serve

Finally, we are ready to deploy the classifier using Ray Serve!

We define a class BoostingModel that runs inference on the GradientBoosingClassifier model we trained and returns the resulting label. It is decorated with @serve.deployment to make it a deployment object so it can be deployed onto Ray Serve. Note that the Serve deployment is exposed over an HTTP route, and by default the __call__ method is invoked when a request is sent to your deployment over HTTP.

@serve.deployment
class BoostingModel:
    def __init__(self, model_path: str, label_path: str):
        with open(model_path, "rb") as f:
            self.model = pickle.load(f)
        with open(label_path) as f:
            self.label_list = json.load(f)

    async def __call__(self, starlette_request: Request) -> Dict:
        payload = await starlette_request.json()
        print("Worker: received starlette request with data", payload)

        input_vector = [
            payload["sepal length"],
            payload["sepal width"],
            payload["petal length"],
            payload["petal width"],
        ]
        prediction = self.model.predict([input_vector])[0]
        human_name = self.label_list[prediction]
        return {"result": human_name}

Note

When BoostingModel is deployed and instantiated, it will load the classifier model that we trained from your file system so that it can be ready to run inference on the model and serve requests later.

Now that we’ve defined our Serve deployment, let’s prepare it so that it can be deployed.

boosting_model = BoostingModel.bind(MODEL_PATH, LABEL_PATH)

Note

BoostingModel.bind(MODEL_PATH, LABEL_PATH) binds the arguments MODEL_PATH and LABEL_PATH to our deployment and returns a DeploymentNode object (wrapping an BoostingModel deployment object) that can then be used to connect with other DeploymentNodes to form a more complex deployment graph.

Finally, we can deploy our model to Ray Serve through the terminal.

$ serve run tutorial_sklearn:boosting_model

Let’s query it! While Serve is running, open a separate terminal window, and run the following in an interactive Python shell or a separate Python script:

import requests

sample_request_input = {
    "sepal length": 1.2,
    "sepal width": 1.0,
    "petal length": 1.1,
    "petal width": 0.9,
}
response = requests.get("http://localhost:8000/", json=sample_request_input)
print(response.text)

You should get an output like the following (the exact prediction may vary):

{"result": "versicolor"}