Deployment of a TensorFlow model to Kubernetes
Deployment of a TensorFlow model to Kubernetes
Let’s imagine that you’ve just finished training your new TensorFlow model and want to start using it in your application(s). One obvious way to do so is to simply import it in the source code of every application that uses it. However, it might be more versatile to keep your model in one place as standalone and simply have applications exchange data with it through API calls. This article will go through the steps of building such a system and deploy the result to Kubernetes.
This guide is based on the official TensorFlow Serving documentation
The TensorFlow Model
First, we need a model to work with. For this purpose, here is a snippet taken from the TensorFlow getting started guide that deals with the building and training of a simple Keras model:
# Data loading, preprocessing etc...
# Building the model
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10)
])
# Compiling the model
model.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
# Train the model
model.fit(x_train, y_train, epochs=5)
The trained model can then be used within the code so as to make predictions. However, this model would me more useful if it could also be used by other applications, potentially deployed on different computers. To do so, the model can be exposed to HTTP request using a REST API. Although this could be done by using Flask to create an API endpoint for our model, Tensorflow Serving offers a similar solution without requiring complicated setup for developers.
Tensorflow Serving consists of a Docker container inside which the desired model is copied. The container contains the necesssary logic to expose the model to HTTP requests.
To use Tensorflow Serving, one must first export the model using the save
function of Keras:
MODEL_NAME = 'mymodel'
MODEL_VERSION = 1
model.save('./{}/{}/'.format(MODEL_NAME, MODEL_VERSION ))
Note how the model name and version are defined.
The model now exists in the folder ./mymodel/1/
.
Preparing a TensorFlow Serving container
Now, an empty TensorFlow serving container can be pulled from Docker Hub and run locally:
docker run -d --name serving_base tensorflow/serving
With the container running, one can now copy the exported model into it using:
docker cp ./mymodel serving_base:/models/mymodel
The container now contains our model and can be saved as a new image. This can be done by using the docker commit command:
docker commit --change "ENV MODEL_NAME mymodel" serving_base my-registry/mymodel-serving
Note here that my-registry is the URL of the docker registry to push the image to.
Once done, we can get rid of the original TensworFlow serving image
docker kill serving_base
docker rm serving_base
Here, it might be a good idea to check if the container is actually working, so let's run it
docker run -d -p 8501:8510 my-registry/mymodel-serving
Note that 8501 is the port TensorFlow serving uses for its REST API.
A Get request to http://localhost:8501/v1/models/mymodel
Should return the following JSON:
{
"model_version_status": [
{
"version": "1",
"state": "AVAILABLE",
"status": {
"error_code": "OK",
"error_message": ""
}
}
]
}
If everything is successful so far, the container can be pushed to the registry, which will make it available for Kubernetes to deploy:
docker push my-registry/mymodel-serving
Deploying the container to Kubernetes
Now that the container has been pushed to a registry, it can be deployed to our Kubernetes cluster. This is achieved by creating two resources in the cluster: a deployment and a service. The deployment is basically the application itself while the service is here to allow users to reach the deployment from outside the cluster. Here, we will use a NodePort service so that our TensorFlow serving container can be accessed from outside the cluster by simply using a dedicated port. We will choose 30111 for this matter.
Creating those resources is done simply by applying the content of a YAML manifest file with the kubectl command. In our case, here is the content of our kubernetes_manifest.yml file:
apiVersion: apps/v1
kind: Deployment
metadata:
name: mymodel-serving
spec:
replicas: 1
selector:
matchLabels:
app: mymodel-serving
template:
metadata:
labels:
app: mymodel-serving
spec:
containers:
- name: mymodel-serving
image: my-registry/mymodel-serving
ports:
- containerPort: 8501
---
apiVersion: v1
kind: Service
metadata:
name: mymodel-serving
spec:
ports:
- port: 8501
nodePort: 30111
selector:
app: mymodel-serving
type: NodePort
The resources can then be created by executing
kubectl apply -f kubernetes_manifest.yml
The container should now be deployed in the Kubernetes cluster
Using the container
The AI model deployed in Kubernetes can now be used for predictions. To do so, one must send a POST request to prediction API of the TensorFlow Serving container with a a body consisting of a JSON containing the input data. The model will then reply with its prediction, also in JSON format. Here is an example of how this can be implemented in Python, using the requests
library
image = cv2.imread(image_path)
payload = json.dumps( {'instances': [image.tolist()]} )
r = requests.post("http://<IP of the Cluster>:30111/v1/models/mymodel:predict", data=payload)
Note here that the image data sent to the AI model is embedded in an array. This is because of the way TensorFlow serving accepts input data. In our case, the input image is 28x28 pixels so the data sent to the AI model must be have a (n,28,28) shape where n is the number of images to evaluate.