<a href="https://colab.research.google.com/github/tsakailab/alpp/blob/main/colab/alpp_saving_and_loading_in_PyTorch.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# [Saving And Loading Models - PyTorch Beginner 17](https://www.python-engineer.com/courses/pytorchbeginner/17-saving-and-loading/)

- `torch.save` can save a model, tensor, or dictionary.
- `torch.load` loads the saved model, tensor, or dictionary.
- `model.load_state_dict` can be used for restoring the saved parameters `model.state_dict()`.

## Assume we have defined a class `Model` and created its instance `model`.

In [1]:
import torch
import torch.nn as nn

class Model(nn.Module):
    def __init__(self, n_input_features):
        super(Model, self).__init__()
        self.linear = nn.Linear(n_input_features, 1)

    def forward(self, x):
        y_pred = torch.sigmoid(self.linear(x))
        return y_pred

model = Model(n_input_features=6)

# train your model...


## Method 1: save and load entire model

- `torch.save(model, FILE)`
- `loaded_model = torch.load(FILE)`


In [2]:
for param in model.parameters():
    print(param)

FILE = "model.pth"
torch.save(model, FILE)

!ls -l

Parameter containing:
tensor([[-0.3397,  0.2688, -0.0415,  0.0330,  0.3733, -0.3722]],
       requires_grad=True)
Parameter containing:
tensor([-0.3537], requires_grad=True)
total 8
-rw-r--r-- 1 root root 1823 Jul 18 15:47 model.pth
drwxr-xr-x 1 root root 4096 Jul 14 13:31 sample_data


In [3]:
FILE = "model.pth"
loaded_model = torch.load(FILE)
loaded_model.eval()

for param in loaded_model.parameters():
    print(param)

Parameter containing:
tensor([[-0.3397,  0.2688, -0.0415,  0.0330,  0.3733, -0.3722]],
       requires_grad=True)
Parameter containing:
tensor([-0.3537], requires_grad=True)


Remember that you must call model.eval() to set dropout and batch normalization layers to evaluation mode before running inference. Failing to do this will yield inconsistent inference results. If you wish to resuming training, call model.train() to ensure these layers are in training mode.

 ## Method 2: save and load only state dict

` model.state_dict()` is a set of all model weights.
 - `torch.save(model.state_dict(), FILE)`
 - `loaded_model.load_state_dict(torch.load(FILE))`

In [4]:
print(model.state_dict())

FILE = "model_dict.pth"
torch.save(model.state_dict(), FILE)

!ls -l

OrderedDict([('linear.weight', tensor([[-0.3397,  0.2688, -0.0415,  0.0330,  0.3733, -0.3722]])), ('linear.bias', tensor([-0.3537]))])
total 12
-rw-r--r-- 1 root root 1139 Jul 18 15:47 model_dict.pth
-rw-r--r-- 1 root root 1823 Jul 18 15:47 model.pth
drwxr-xr-x 1 root root 4096 Jul 14 13:31 sample_data


In [5]:
FILE = "model_dict.pth"

# create a model with the same archtecture before loading the weights
loaded_model = Model(n_input_features=6)

loaded_model.load_state_dict(torch.load(FILE))
loaded_model.eval()

print(loaded_model.state_dict())

OrderedDict([('linear.weight', tensor([[-0.3397,  0.2688, -0.0415,  0.0330,  0.3733, -0.3722]])), ('linear.bias', tensor([-0.3537]))])


## Save and load an optimizer

An optimizer has `state_dict`, so it can be saved and loaded in the same way as a model.

In [6]:
learning_rate = 0.01
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)
print(optimizer.state_dict())

{'state': {}, 'param_groups': [{'lr': 0.01, 'momentum': 0, 'dampening': 0, 'weight_decay': 0, 'nesterov': False, 'maximize': False, 'foreach': None, 'differentiable': False, 'params': [0, 1]}]}


### Save and load a checkpoint as a dictionary of model and optimzer states in training

In [7]:
checkpoint = {
"epoch": 90,
"model_state": model.state_dict(),
"optim_state": optimizer.state_dict()
}

FILE = "checkpoint.pth"
torch.save(checkpoint, FILE)

!ls -l

total 16
-rw-r--r-- 1 root root 1395 Jul 18 15:47 checkpoint.pth
-rw-r--r-- 1 root root 1139 Jul 18 15:47 model_dict.pth
-rw-r--r-- 1 root root 1823 Jul 18 15:47 model.pth
drwxr-xr-x 1 root root 4096 Jul 14 13:31 sample_data


In [8]:
FILE = "checkpoint.pth"

# create a model and optimizer with the same archtecture before loading the states
loaded_model = Model(n_input_features=6)
loaded_optimizer = torch.optim.SGD(model.parameters(), lr=0)

loaded_checkpoint = torch.load(FILE)
epoch = loaded_checkpoint['epoch']
loaded_model.load_state_dict(loaded_checkpoint['model_state'])
loaded_optimizer.load_state_dict(loaded_checkpoint['optim_state'])

print(loaded_optimizer.state_dict())

{'state': {}, 'param_groups': [{'lr': 0.01, 'momentum': 0, 'dampening': 0, 'weight_decay': 0, 'nesterov': False, 'maximize': False, 'foreach': None, 'differentiable': False, 'params': [0, 1]}]}


## CPU/GPU mapping

- use `map_location` option in `load_state_dict()`

In [9]:
# 1) Save on GPU, Load on CPU

model = Model(n_input_features=6)

device = torch.device("cuda")
model.to(device)
print(model.state_dict())   # see device='cuda:0'

FILE = "model_state_in_GPU.pth"
torch.save(model.state_dict(), FILE)

!ls -l

device = torch.device('cpu')
loaded_model = Model(n_input_features=6)
loaded_model.load_state_dict(torch.load(FILE, map_location=device))
loaded_model.eval()

print(loaded_model.state_dict())   # no see device='cuda:0'

OrderedDict([('linear.weight', tensor([[-0.0780,  0.0518,  0.1266,  0.2063,  0.0951,  0.0988]],
       device='cuda:0')), ('linear.bias', tensor([0.0322], device='cuda:0'))])
total 20
-rw-r--r-- 1 root root 1395 Jul 18 15:47 checkpoint.pth
-rw-r--r-- 1 root root 1139 Jul 18 15:47 model_dict.pth
-rw-r--r-- 1 root root 1823 Jul 18 15:47 model.pth
-rw-r--r-- 1 root root 1171 Jul 18 15:47 model_state_in_GPU.pth
drwxr-xr-x 1 root root 4096 Jul 14 13:31 sample_data
OrderedDict([('linear.weight', tensor([[-0.0780,  0.0518,  0.1266,  0.2063,  0.0951,  0.0988]])), ('linear.bias', tensor([0.0322]))])


In [10]:
# 2) Save on GPU, Load on GPU

model = Model(n_input_features=6)

device = torch.device("cuda")
model.to(device)
print(model.state_dict())   # see device='cuda:0'

FILE = "model_state_in_GPU.pth"
torch.save(model.state_dict(), FILE)

!ls -l

loaded_model = Model(n_input_features=6)
loaded_model.load_state_dict(torch.load(FILE))  # No map_location required
loaded_model.to(device)
print(loaded_model.state_dict())    # see device='cuda:0' again

# Note: Be sure to use the .to(torch.device('cuda')) function
# on all model inputs, too!

OrderedDict([('linear.weight', tensor([[ 0.2909,  0.0101, -0.3597, -0.0673, -0.2807, -0.2409]],
       device='cuda:0')), ('linear.bias', tensor([-0.4055], device='cuda:0'))])
total 20
-rw-r--r-- 1 root root 1395 Jul 18 15:47 checkpoint.pth
-rw-r--r-- 1 root root 1139 Jul 18 15:47 model_dict.pth
-rw-r--r-- 1 root root 1823 Jul 18 15:47 model.pth
-rw-r--r-- 1 root root 1171 Jul 18 15:47 model_state_in_GPU.pth
drwxr-xr-x 1 root root 4096 Jul 14 13:31 sample_data
OrderedDict([('linear.weight', tensor([[ 0.2909,  0.0101, -0.3597, -0.0673, -0.2807, -0.2409]],
       device='cuda:0')), ('linear.bias', tensor([-0.4055], device='cuda:0'))])


In [11]:
# 3) Save on CPU, Load on GPU

model = Model(n_input_features=6)
print(model.state_dict())   # no see device='cuda:0'

FILE = "model_state_in_CPU.pth"
torch.save(model.state_dict(), FILE)

!ls -l

loaded_model = Model(n_input_features=6)
loaded_model.load_state_dict(torch.load(FILE, map_location="cuda:0"))  # Choose whatever GPU device number you want
device = torch.device("cuda")
loaded_model.to(device)     # be sure to call this to convert the model's parameter tensors to CUDA tensors
print(loaded_model.state_dict())    # see device='cuda:0'

OrderedDict([('linear.weight', tensor([[ 0.3628, -0.0435,  0.3482,  0.0617, -0.3224,  0.0701]])), ('linear.bias', tensor([-0.3448]))])
total 24
-rw-r--r-- 1 root root 1395 Jul 18 15:47 checkpoint.pth
-rw-r--r-- 1 root root 1139 Jul 18 15:47 model_dict.pth
-rw-r--r-- 1 root root 1823 Jul 18 15:47 model.pth
-rw-r--r-- 1 root root 1171 Jul 18 15:47 model_state_in_CPU.pth
-rw-r--r-- 1 root root 1171 Jul 18 15:47 model_state_in_GPU.pth
drwxr-xr-x 1 root root 4096 Jul 14 13:31 sample_data
OrderedDict([('linear.weight', tensor([[ 0.3628, -0.0435,  0.3482,  0.0617, -0.3224,  0.0701]],
       device='cuda:0')), ('linear.bias', tensor([-0.3448], device='cuda:0'))])


That's all!