# **save_and_load_pytorch_model**

One of the important aspects of working on neural network models is to save and load back a model after training. Think of a scenario where you have to make inferences from an already-trained model. You would load the trained model instead of training it again.

# First we write the previous code and dataset...

In [1]:
x = [[1,2],[3,4],[5,6],[7,8]]
y = [[3],[7],[11],[15]]

In [3]:
import torch
import torch.nn as nn
import numpy as np
from torch.utils.data import Dataset, DataLoader
device = 'cuda' if torch.cuda.is_available() else 'gpu'

In [4]:
class MyDataset(Dataset):
    def __init__(self, x, y):
        self.x = torch.tensor(x).float().to(device)
        self.y = torch.tensor(y).float().to(device)
    def __getitem__(self, ix):
        return self.x[ix], self.y[ix]
    def __len__(self): 
        return len(self.x)

In [5]:
ds = MyDataset(x, y)
dl = DataLoader(ds, batch_size=2, shuffle=True)

In [6]:
model = nn.Sequential(
    nn.Linear(2, 8),
    nn.ReLU(),
    nn.Linear(8, 1)
).to(device)

In [7]:

!pip install torch_summary
from torchsummary import summary

Collecting torch_summary
  Downloading https://files.pythonhosted.org/packages/ca/db/93d18c84f73b214acfa4d18051d6f4263eee3e044c408928e8abe941a22c/torch_summary-1.4.5-py3-none-any.whl
Installing collected packages: torch-summary
Successfully installed torch-summary-1.4.5


In [9]:
summary(model, torch.zeros(1,2))

Layer (type:depth-idx)                   Output Shape              Param #
├─Linear: 1-1                            [-1, 8]                   24
├─ReLU: 1-2                              [-1, 8]                   --
├─Linear: 1-3                            [-1, 1]                   9
Total params: 33
Trainable params: 33
Non-trainable params: 0
Total mult-adds (M): 0.00
Input size (MB): 0.00
Forward/backward pass size (MB): 0.00
Params size (MB): 0.00
Estimated Total Size (MB): 0.00


Layer (type:depth-idx)                   Output Shape              Param #
├─Linear: 1-1                            [-1, 8]                   24
├─ReLU: 1-2                              [-1, 8]                   --
├─Linear: 1-3                            [-1, 1]                   9
Total params: 33
Trainable params: 33
Non-trainable params: 0
Total mult-adds (M): 0.00
Input size (MB): 0.00
Forward/backward pass size (MB): 0.00
Params size (MB): 0.00
Estimated Total Size (MB): 0.00

Before going through the relevant commands to do that, taking the preceding example as our case, let's understand what all the important components that completely define a neural network are. We need the following:



*   A unique name (key) for each tensor (parameter)
*   The logic to connect every tensor in the network with one or the other


*   The values (weight/bias values) of each tensor
*   List item





While the first point is taken care of during the __init__ phase of a definition, the second point is taken care of during the forward method definition. By default, the values in a tensor are randomly initialized during the __init__ phase. But what we want is to load a specific set of weights (or values) that were learned when training a model and associate each value with a specific name. This is what you obtain by calling a special method, described in the following sections.

# **state dict**




The model.state_dict() command is at the root of understanding how saving and loading PyTorch models works. The dictionary in model.state_dict() corresponds to the parameter names (keys) and the values (weight and bias values) corresponding to the model. state refers to the current snapshot of the model (where the snapshot is the set of values at each tensor).

In [10]:
model.state_dict()

OrderedDict([('0.weight', tensor([[ 0.2109,  0.2298],
                      [ 0.6608,  0.4783],
                      [-0.3486,  0.5238],
                      [-0.2397,  0.3247],
                      [ 0.6882,  0.1874],
                      [ 0.3560,  0.7061],
                      [-0.1680,  0.0015],
                      [-0.3639, -0.3079]], device='cuda:0')),
             ('0.bias',
              tensor([ 0.4266, -0.3196, -0.2202, -0.2097,  0.2858,  0.6691,  0.1063,  0.5520],
                     device='cuda:0')),
             ('2.weight',
              tensor([[ 0.0601,  0.2538, -0.0863,  0.0489, -0.0053, -0.3316,  0.1598, -0.0293]],
                     device='cuda:0')),
             ('2.bias', tensor([0.1951], device='cuda:0'))])

# **Saving**

In [11]:
save_path = 'mymodel.pth'
torch.save(model.state_dict(), save_path)
!du -hsc {save_path} # size of the model on disk

4.0K	mymodel.pth
4.0K	total


# **Loading**

In [12]:
#Load state_dict onto model, register to device, and make a prediction:
load_path = 'mymodel.pth'
model.load_state_dict(torch.load(load_path))

<All keys matched successfully>

# **predictions..**

In [13]:

val = [[8,9],[10,11],[1.5,2.5]]
val = torch.tensor(val).float()

In [14]:
model(val.to(device))

tensor([[-0.6223],
        [-0.7268],
        [-0.2826]], device='cuda:0', grad_fn=<AddmmBackward>)

In [15]:
val.sum(-1)

tensor([17., 21.,  4.])

In [18]:
val.sum(+1)

tensor([17., 21.,  4.])