Serve ML Models (Tensorflow, PyTorch, Scikit-Learn, others)#
This guide shows how to train models from various machine learning frameworks and deploy them to Ray Serve.
See the Key Concepts to learn more general information about Ray Serve.
This example trains and deploys a simple TensorFlow neural net. In particular, it shows:
How to train a TensorFlow model and load the model from your file system in your Ray Serve deployment.
How to parse the JSON request and make a prediction.
Ray Serve is framework-agnostic–you can use any version of TensorFlow.
This tutorial uses TensorFlow 2 and Keras. You also need requests
to send HTTP requests to your model deployment. If you haven’t already, install TensorFlow 2 and requests by running:
$ pip install "tensorflow>=2.0" requests
Open a new Python file called tutorial_tensorflow.py
. First, import Ray Serve and some other helpers.
from ray import serve
import os
import tempfile
import numpy as np
from starlette.requests import Request
from typing import Dict
import tensorflow as tf
Next, train a simple MNIST model using Keras.
TRAINED_MODEL_PATH = os.path.join(tempfile.gettempdir(), "mnist_model.h5")
def train_and_save_model():
# Load mnist dataset
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
# Train a simple neural net model
model = tf.keras.models.Sequential(
[
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(128, activation="relu"),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10),
]
)
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
model.compile(optimizer="adam", loss=loss_fn, metrics=["accuracy"])
model.fit(x_train, y_train, epochs=1)
model.evaluate(x_test, y_test, verbose=2)
model.summary()
# Save the model in h5 format in local file system
model.save(TRAINED_MODEL_PATH)
if not os.path.exists(TRAINED_MODEL_PATH):
train_and_save_model()
Next, define a TFMnistModel
class that accepts HTTP requests and runs the MNIST model that you trained. The @serve.deployment
decorator makes it a deployment object that you can deploy onto Ray Serve. Note that Ray Serve exposes the deployment over an HTTP route. By default, when the deployment receives a request over HTTP, Ray Serve invokes the __call__
method.
@serve.deployment
class TFMnistModel:
def __init__(self, model_path: str):
import tensorflow as tf
self.model_path = model_path
self.model = tf.keras.models.load_model(model_path)
async def __call__(self, starlette_request: Request) -> Dict:
# Step 1: transform HTTP request -> tensorflow input
# Here we define the request schema to be a json array.
input_array = np.array((await starlette_request.json())["array"])
reshaped_array = input_array.reshape((1, 28, 28))
# Step 2: tensorflow input -> tensorflow output
prediction = self.model(reshaped_array)
# Step 3: tensorflow output -> web output
return {"prediction": prediction.numpy().tolist(), "file": self.model_path}
Note
When you deploy and instantiate the TFMnistModel
class, Ray Serve loads the TensorFlow model from your file system so that it can be ready to run inference on the model and serve requests later.
Now that you’ve defined the Serve deployment, prepare it so that you can deploy it.
mnist_model = TFMnistModel.bind(TRAINED_MODEL_PATH)
Note
TFMnistModel.bind(TRAINED_MODEL_PATH)
binds the argument TRAINED_MODEL_PATH
to the deployment and returns a DeploymentNode
object, a wrapping of the TFMnistModel
deployment object, that you can then use to connect with other DeploymentNodes
to form a more complex deployment graph.
Finally, deploy the model to Ray Serve through the terminal.
$ serve run tutorial_tensorflow:mnist_model
Next, query the model. While Serve is running, open a separate terminal window, and run the following in an interactive Python shell or a separate Python script:
import requests
import numpy as np
resp = requests.get(
"http://localhost:8000/", json={"array": np.random.randn(28 * 28).tolist()}
)
print(resp.json())
You should get an output like the following, although the exact prediction may vary:
{
"prediction": [[-1.504277229309082, ..., -6.793371200561523]],
"file": "/tmp/mnist_model.h5"
}
This example loads and deploys a PyTorch ResNet model. In particular, it shows:
How to load the model from PyTorch’s pre-trained Model Zoo.
How to parse the JSON request, transform the payload and make a prediction.
This tutorial requires PyTorch and Torchvision. Ray Serve is framework agnostic and works with any version of PyTorch. You also need requests
to send HTTP requests to your model deployment. If you haven’t already, install them by running:
$ pip install torch torchvision requests
Open a new Python file called tutorial_pytorch.py
. First, import Ray Serve and some other helpers.
from ray import serve
from io import BytesIO
from PIL import Image
from starlette.requests import Request
from typing import Dict
import torch
from torchvision import transforms
from torchvision.models import resnet18
Define a class ImageModel
that parses the input data, transforms the images, and runs the ResNet18 model loaded from torchvision
. The @serve.deployment
decorator makes it a deployment object that you can deploy onto Ray Serve. Note that Ray Serve exposes the deployment over an HTTP route. By default, when the deployment receives a request over HTTP, Ray Serve invokes the __call__
method.
@serve.deployment
class ImageModel:
def __init__(self):
self.model = resnet18(pretrained=True).eval()
self.preprocessor = transforms.Compose(
[
transforms.Resize(224),
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Lambda(lambda t: t[:3, ...]), # remove alpha channel
transforms.Normalize(
mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]
),
]
)
async def __call__(self, starlette_request: Request) -> Dict:
image_payload_bytes = await starlette_request.body()
pil_image = Image.open(BytesIO(image_payload_bytes))
print("[1/3] Parsed image data: {}".format(pil_image))
pil_images = [pil_image] # Our current batch size is one
input_tensor = torch.cat(
[self.preprocessor(i).unsqueeze(0) for i in pil_images]
)
print("[2/3] Images transformed, tensor shape {}".format(input_tensor.shape))
with torch.no_grad():
output_tensor = self.model(input_tensor)
print("[3/3] Inference done!")
return {"class_index": int(torch.argmax(output_tensor[0]))}
Note
When you deploy and instantiate an ImageModel
class, Ray Serve loads the ResNet18 model from torchvision
so that it can be ready to run inference on the model and serve requests later.
Now that you’ve defined the Serve deployment, prepare it so that you can deploy it.
image_model = ImageModel.bind()
Note
ImageModel.bind()
returns a DeploymentNode
object, a wrapping of the ImageModel
deployment object, that you can then use to connect with other DeploymentNodes
to form a more complex deployment graph.
Finally, deploy the model to Ray Serve through the terminal.
$ serve run tutorial_pytorch:image_model
Next, query the model. While Serve is running, open a separate terminal window, and run the following in an interactive Python shell or a separate Python script:
import requests
ray_logo_bytes = requests.get(
"https://raw.githubusercontent.com/ray-project/"
"ray/master/doc/source/images/ray_header_logo.png"
).content
resp = requests.post("http://localhost:8000/", data=ray_logo_bytes)
print(resp.json())
You should get an output like the following, although the exact number may vary:
{'class_index': 919}
This example trains and deploys a simple scikit-learn classifier. In particular, it shows:
How to load the scikit-learn model from file system in your Ray Serve definition.
How to parse the JSON request and make a prediction.
Ray Serve is framework-agnostic. You can use any version of sklearn. You also need requests
to send HTTP requests to your model deployment. If you haven’t already, install scikit-learn and requests by running:
$ pip install scikit-learn requests
Open a new Python file called tutorial_sklearn.py
. Import Ray Serve and some other helpers.
from ray import serve
import pickle
import json
import numpy as np
import os
import tempfile
from starlette.requests import Request
from typing import Dict
from sklearn.datasets import load_iris
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.metrics import mean_squared_error
Train a Classifier
Next, train a classifier with the Iris dataset.
First, instantiate a GradientBoostingClassifier
loaded from scikit-learn.
model = GradientBoostingClassifier()
Next, load the Iris dataset and split the data into training and validation sets.
iris_dataset = load_iris()
data, target, target_names = (
iris_dataset["data"],
iris_dataset["target"],
iris_dataset["target_names"],
)
np.random.shuffle(data)
np.random.shuffle(target)
train_x, train_y = data[:100], target[:100]
val_x, val_y = data[100:], target[100:]
Then, train the model and save it to a file.
model.fit(train_x, train_y)
print("MSE:", mean_squared_error(model.predict(val_x), val_y))
# Save the model and label to file
MODEL_PATH = os.path.join(
tempfile.gettempdir(), "iris_model_gradient_boosting_classifier.pkl"
)
LABEL_PATH = os.path.join(tempfile.gettempdir(), "iris_labels.json")
with open(MODEL_PATH, "wb") as f:
pickle.dump(model, f)
with open(LABEL_PATH, "w") as f:
json.dump(target_names.tolist(), f)
Deploy with Ray Serve
Finally, you’re ready to deploy the classifier using Ray Serve.
Define a BoostingModel
class that runs inference on the GradientBoosingClassifier
model you trained and returns the resulting label. It’s decorated with @serve.deployment
to make it a deployment object so you can deploy it onto Ray Serve. Note that Ray Serve exposes the deployment over an HTTP route. By default, when the deployment receives a request over HTTP, Ray Serve invokes the __call__
method.
@serve.deployment
class BoostingModel:
def __init__(self, model_path: str, label_path: str):
with open(model_path, "rb") as f:
self.model = pickle.load(f)
with open(label_path) as f:
self.label_list = json.load(f)
async def __call__(self, starlette_request: Request) -> Dict:
payload = await starlette_request.json()
print("Worker: received starlette request with data", payload)
input_vector = [
payload["sepal length"],
payload["sepal width"],
payload["petal length"],
payload["petal width"],
]
prediction = self.model.predict([input_vector])[0]
human_name = self.label_list[prediction]
return {"result": human_name}
Note
When you deploy and instantiate a BoostingModel
class, Ray Serve loads the classifier model that you trained from the file system so that it can be ready to run inference on the model and serve requests later.
After you’ve defined the Serve deployment, prepare it so that you can deploy it.
boosting_model = BoostingModel.bind(MODEL_PATH, LABEL_PATH)
Note
BoostingModel.bind(MODEL_PATH, LABEL_PATH)
binds the arguments MODEL_PATH
and LABEL_PATH
to the deployment and returns a DeploymentNode
object, a wrapping of the BoostingModel
deployment object, that you can then use to connect with other DeploymentNodes
to form a more complex deployment graph.
Finally, deploy the model to Ray Serve through the terminal.
$ serve run tutorial_sklearn:boosting_model
Next, query the model. While Serve is running, open a separate terminal window, and run the following in an interactive Python shell or a separate Python script:
import requests
sample_request_input = {
"sepal length": 1.2,
"sepal width": 1.0,
"petal length": 1.1,
"petal width": 0.9,
}
response = requests.get("http://localhost:8000/", json=sample_request_input)
print(response.text)
You should get an output like the following, although the exact prediction may vary:
{"result": "versicolor"}