Mastering Model Export, Serving and Deployment

From concept to production success

6 min readAug 15, 2024

In the fast-paced world of AI, the journey from developing a ground-breaking model to delivering its transformative power to end-users is a thrilling expedition. Model export, serving and deployment are the crucial stages that determine whether a promising algorithm becomes a practical, game-changing solution. Dive into the intricate dance of turning theoretical brilliance into real-world impact and discover how seamless execution can propel your innovations from the lab to the spotlight.

Table of Content

Introduction
System resources
Model export and serialization
Pytorch model export and import

Introduction

This section provides a comprehensive exploration into the crucial world of
machine learning lifecycle, focusing on model serialization, export and

deployment. The importance of grasping these concepts lies in the reality
that machine learning models, regardless of their sophistication, yield no
value unless they are effectively deployed to make predictions in real-time
applications.

System Resources

Setting Up Environment

Install Anaconda on the local machine.
Create a virtual environment.
Install necessary packages in the virtual environment.
Configure and Start Jupyter Notebook.
Connect Google Colab with your local runtime environment.

Installing Anaconda On Local System

Go to the Anaconda download page. https://www.anaconda.com/products/distribution
Download the appropriate version for your computer.
Follow the instructions provided by the installer.
If the installer prompt you to add anaconda in system’s PATH variable please do it. This enables you to seamlessly use Anaconda’s features from the command line.
Check if installation is successful by typing the following command in the terminal.

conda --version

Creating a Virtual Environment

To create a virtual environment in Anaconda via the terminal, follow these steps.

Open the terminal on your local machine.
Type the following command and press Enter to create a new virtual environment. In the below code, the virtual environment name is torch_learn and the python version is 3.11.

conda create --name 
torch_learn 
python=3.11

3. Once the environment has been created, activate it by typing the following command.

conda activate transformer_learn

4. Install the necessary Package in your environment. Following are requirements for section 2. Install based on each section.

pip3 install transformers
pip3 install datasets
pip3 install git+https://github.com/huggingface/diffusers
pip3 install accelerate
pip3 install ftfy
pip3 install tensorboard
pip3 install Jinja2
pip3 install torch
pip3 install torchtext
pip3 install onnx
pip3 install onnxruntime
pip3 install optimum
pip3 install fastapi[all]
pip3 install uvicorn[standard]

Model Export and Serialization

Model export refers to the process of transforming a trained machine
learning model into a format that can be used independently of the original

training environment. This format could be a simple binary file, a set of
weights or even a more structured format such as ONNX or PyTorch
Script. On the other hand, model serialization is the process of converting
the model into a format that can be stored or transmitted over the network

and then reconstructed or deserialized back into the original model
structure.
There are various formats for model export and serialization, including

ONNX, PyTorch Script and Pickle. ONNX provides a platform

independent format to represent models, which can be used across various
deep learning frameworks such as PyTorch, TensorFlow and MXNet.
PyTorch Script offers a way to serialize PyTorch models by transcribing
them into a subset of Python and Pickle is a standard Python tool for
serialization and deserialization. In this section, we will discuss
model export in the PyTorch Format and ONNX format.

Pytorch Model Export and Import

For saving and loading the PyTorch models, there are three core
functionalities. These three key functions are crucial when it comes to
storing and retrieving models are as follows.

torch.save

This function enables the saving of serialized objects to disk, utilizing

Python’s pickle utility for the serialization process. It can handle models,
tensors and dictionaries comprising various objects. We can use this
function to save the entire module or just the state_dict of the module. Let us understand more about the state_dict. In PyTorch, a state_dict is

essentially a Python dictionary object that maps each layer in the model to
its corresponding parameters (tensors). It is worth noting that only layers

with learnable parameters convolutional layers, linear layers, and so on
and registered buffers batchnorm’s running mean have entries in the
model’s state_dict. Optimizers also have a state_dict, which contains
information about the optimizer’s state, as well as the hyper parameters
used.

torch.load

Leveraging pickle’s unpickling abilities, this function deserializes pickled
object files back into memory.

torch.nn.module.load_state_dict

This function is utilized to load a model’s parameter dictionary using a
deserialized state_dict. Let us understand this through an example.

Declare the model, Here, we are declaring a simple CNN model for

illustration.

import torch
import torch.nn as nn
import torch.nn.functional as F

class SimpleCNN(nn.Module):
 def __init__(self):
 super(SimpleCNN, self).__init__()
 self.conv1 = nn.Conv2d(3, 6, 5) #
Assuming input image channel=3 (RGB), 6 output
channels, 5x5 kernel
 self.pool = nn.MaxPool2d(2, 2)

self.conv2 = nn.Conv2d(6, 16, 5)
 self.fc1 = nn.Linear(16 * 5 * 5, 120) #
5*5 from image dimension
 self.fc2 = nn.Linear(120, 84)
 self.fc3 = nn.Linear(84, 10) # Assuming
10 classes for output

def forward(self, x):
 x = self.pool(F.relu(self.conv1(x)))
 x = self.pool(F.relu(self.conv2(x)))
 x = x.view(-1, 16 * 5 * 5) # Reshape
before passing to fc layer
 x = F.relu(self.fc1(x))
 x = F.relu(self.fc2(x))
 x = self.fc3(x)
 return x

# Initialize the model
model = SimpleCNN()

When saving the state_dict, we utilize the model.state_dict() method to
store the model’s learnable parameters. It is key to note that only the
model’s tunable parameters are being saved in this process.

# Save model state_dict
torch.save(model.state_dict(), "simple_cnn_state_dict.pt")

When loading and displaying the state_dict, it is important to recognize
that the model object must be declared prior to loading the state_dict. The
file simple_cnn_state_dict.pt does not contain any information linked to

the model class. Another important point is, we must call model.eval before
using the model for inference and otherwise you will see inconsistencies in

your evaluation.

# Create a new model object
model2 = SimpleCNN()
# Load the state_dict into the model
model2.load_state_dict(torch.load("simple_cnn_stat
e_dict.pt"))
model2.eval()
# Print model's state_dict
print("Model's state_dict:")
for param_tensor in model2.state_dict():
 print(param_tensor, "\t", model.state_dict()
[param_tensor].size())

The output demonstrates that the state_dict is essentially a dictionary
containing learnable parameters. It becomes clear that the state_dict
encompasses the weights and biases for each layer within the neural
network.

Model's state_dict:
conv1.weight torch.Size([6, 3, 5, 5])
conv1.bias torch.Size([6])
conv2.weight torch.Size([16, 6, 5, 5])
conv2.bias torch.Size([16])
fc1.weight torch.Size([120, 400])
fc1.bias torch.Size([120])
fc2.weight torch.Size([84, 120])
fc2.bias torch.Size([84])
fc3.weight torch.Size([10, 84])
fc3.bias torch.Size([10])

The key question that we must address is, Why is it standard practice to
save the state_dict rather than the entire model? Several reasons
substantiate this approach.

Versatility — The state_dict is a Python dictionary object, hence it is
easy to manage, interpret, and if required, modify. It gives the liberty
to readily alter the parameters values before injecting them into a

different model.
Device compatibility — The state_dict can be loaded onto any device
regardless of its original save location. This facilitates better
portability and sharing of models.
Efficiency in storage — Typically, the state_dict takes up lesser disk
space as it solely contains the model weights, unlike the entire model

structure.
Model autonomy — By saving the state_dict, we have the option to
construct models that have similar structures but do not necessarily
belong to the same class. This can prove advantageous in scenarios

involving transfer learning.

Conclusion

In navigating the intricate landscape of model export, serving and deployment, we unlock the true potential of our AI innovations. From understanding the system requirements to mastering the nuances of PyTorch’s serialization functions, each step is pivotal in ensuring that our models transition seamlessly from development to real-world application. By effectively utilizing torch.save and torch.load and accurately managing model state dictionaries with

torch.nn.Model.load_state_dict, we ensure not only the preservation of model integrity but also its operational excellence. As we move forward, embracing these practices will empower us to deploy sophisticated models efficiently, bridging the gap between theoretical research and practical, impactful solutions. The future of AI hinges on our ability to make these transitions smooth and effective, heralding a new era where innovation thrives in real-time applications.