# Save & Load Models

Based on **Patric Loeber** video: https://www.youtube.com/watch?v=c36lUUr864M&t=15434s

There are the only 3 different methods:
+ torch.save(arg, PATH)
+ torch.load(PATH)
+ model.load_state_dict(arg)

torch.save can use tensors, models or any dictionary as parameter for saving, makes use of python pickle module to serialize the objects and saves them so the result is serialized and not human readable.

For saving our model we have two options.

In [None]:
import torch
import torch.nn as nn

#### COMPLETE MODEL ####
torch.save(model, PATH)

# model class must be defined somewhere
model = torch.load(PATH)
model.eval()

### Lazy method

We are just calling torch.save on our model. We have to specify the path or the file name. Later when we want to load our model we just set up our model by typing model=torch.load(PATH). Then we also want to set our model to evaluation method.

Disadvantage of this approach is that the serialized data is bound to the specific classes and the exact directory structure that is used when the model is saved.

### Practice

In [1]:
import torch
import torch.nn as nn

class Model(nn.Module):
    def __init__(self, n_input_features):
        super(Model, self).__init__()
        self.linear = nn.Linear(n_input_features, 1)
        
    def forward(self, x):
        y_pred = torch.sigmoid(self.linear(x))
        return y_pred
    
model = Model(n_input_features=6)
# train our model...

# lazy method

FILE = "model.pth" # it is common to use .pth -> PyTorch
torch.save(model, FILE)

In [2]:
model = torch.load(FILE)
model.eval()

for param in model.parameters():
    print(param)

Parameter containing:
tensor([[-0.4053, -0.0697,  0.2480, -0.0789,  0.2890, -0.2310]],
       requires_grad=True)
Parameter containing:
tensor([-0.3592], requires_grad=True)


### Recomended way of saving our model

If we just want to save our train model and use it later for inference then it is enough to only save the parameters. We can save any dictionary with torch.save so we can save the parameters by calling **torch.save** with **model.state_dict()** which hold the parameters and then the **PATH**. Later when we want to load our model again first we have to create the model object and then we call the **model.load_state_dict()** and then inside it we call **torch.load(PATH)**. We have to be careful since load state dict doesn't take only a path but loaded dictionary here. Then again we set our model to evaluation mode. This is the preferred way.

In [None]:
#### STATE DICT ####
torch.save(model.state_dict(), PATH)

# model must be crated agai with parameters
model = Model(*args, **kwargs)
model.load_state_dict(torch.load(PATH))
model.eval()

### Practice

In [8]:
import torch
import torch.nn as nn

class Model(nn.Module):
    def __init__(self, n_input_features):
        super(Model, self).__init__()
        self.linear = nn.Linear(n_input_features, 1)
        
    def forward(self, x):
        y_pred = torch.sigmoid(self.linear(x))
        return y_pred
    
model = Model(n_input_features=6)
# train our model...

for param in model.parameters():
    print(param)
# prefered method

FILE = "model.pth" # it is common to use .pth -> PyTorch
torch.save(model.state_dict(), FILE)

Parameter containing:
tensor([[ 0.0180, -0.1813, -0.0905,  0.2800, -0.0344,  0.0454]],
       requires_grad=True)
Parameter containing:
tensor([-0.1517], requires_grad=True)


In [9]:
loaded_model = Model(n_input_features=6)
loaded_model.load_state_dict(torch.load(FILE))
loaded_model.eval()

for param in loaded_model.parameters():
    print(param)

Parameter containing:
tensor([[ 0.0180, -0.1813, -0.0905,  0.2800, -0.0344,  0.0454]],
       requires_grad=True)
Parameter containing:
tensor([-0.1517], requires_grad=True)


## Saving a whole checkpoint during training

Let's say we want to stop somewhere at some point during training and save a checkpoint.

In [11]:
import torch
import torch.nn as nn

class Model(nn.Module):
    def __init__(self, n_input_features):
        super(Model, self).__init__()
        self.linear = nn.Linear(n_input_features, 1)
        
    def forward(self, x):
        y_pred = torch.sigmoid(self.linear(x))
        return y_pred
    
model = Model(n_input_features=6)
# train our model...

learning_rate = 0.01
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)
print(optimizer.state_dict())

checkpoint = {
    "epoch": 90,
    "model_state": model.state_dict(),
    "optim_state": optimizer.state_dict()
}

torch.save(checkpoint, "checkpoint.pth")

{'state': {}, 'param_groups': [{'lr': 0.01, 'momentum': 0, 'dampening': 0, 'weight_decay': 0, 'nesterov': False, 'maximize': False, 'foreach': None, 'differentiable': False, 'params': [0, 1]}]}


In [12]:
loaded_checkpoint = torch.load("checkpoint.pth")
epoch = loaded_checkpoint["epoch"]

model = Model(n_input_features=6)
optimizer = torch.optim.SGD(model.parameters(), lr=0)

model.load_state_dict(checkpoint["model_state"])
optimizer.load_state_dict(checkpoint["optim_state"])

print(optimizer.state_dict())

{'state': {}, 'param_groups': [{'lr': 0.01, 'momentum': 0, 'dampening': 0, 'weight_decay': 0, 'nesterov': False, 'maximize': False, 'foreach': None, 'differentiable': False, 'params': [0, 1]}]}


## Using GPU during training

### Save on GPU, Load on CPU

If we save our model on the GPU and then later we want to load on the cpu then we have to do it this way. Let' say somewhere during our training we set up our cuda device and we send our odel to the device and then we save it by using the state dict. Then we want to load it to the CPU. So we have our cpu and then we create our model again and we call **model.load_state_dict()** with **torch.load(PATH, map_location=device)** inside. We have to specify the map location, here we give it the cpu device.

In [None]:
import torch
import torch.nn as nn

# Save on GPU, Load on CPU
device = torch.device("cuda")
model.to(device)
torch.save(model_state_dict(), PATH)

device = torch.device("cpu")
model = Model(*args, **kwargs)
model.load_state_dict(torch.load(PATH, map_location=device))

### Save on GPU, Load on GPU

If we want to do both save and load on the GPU. We just send our model to the cuda device and save it. Then we just set up our model and use **load_state_dict()** method with **torch.load(PATH)** inside and then we send our model to the cuda device.

In [None]:
import torch
import torch.nn as nn

# Save on GPU, Load on GPU
device = torch.device("cuda")
model.to(device)
torch.save(model_state_dict(), PATH)

model = Model(*args, **kwargs)
model.load_state_dict(torch.load(PATH))
model.to(device)

### Save on CPU, Load on GPU

Let's say we saved our model on the CPU but later during loading we want t load it to the GPU then we first have to specify the cuda device. Then we create our model and then we call **model.load_state_dict()** with **torch.load(PATH, map_location="cuda:0")** inside. As map location we specify cuda: and any GPU device number we want. After that we also have to call **model.to(device)**. We also have to send all the training samples to the device.

In [None]:
# Save on CPU, Load on GPU
torch.save(model.state_dict(), PATH)

device = torch.device("cuda")
model = Model(*args, **kwargs)
model.load_state_dict(torch.load(PATH, map_location="cuda:0")) # Choose whatever GPU device number you want
model.to(device)