# Q4 Shoulders of Giants (15 points)
As we have already seen, deep networks can sometimes be hard to optimize. Often times they heavily overfit on small training sets. Many approaches have been proposed to counter this, eg, [Krahenbuhl et al. (ICLR’16)](http://arxiv.org/pdf/1511.06856.pdf), self-supervised learning, etc. However, the most effective approach remains pre-training the network on large, well-labeled supervised datasets such as ImageNet. 

While training on the full ImageNet data is beyond the scope of this assignment, people have already trained many popular/standard models and released them online. In this task, we will initialize a ResNet-18 model with pre-trained ImageNet weights (from `torchvision`), and finetune the network for PASCAL classification.

## 4.1 Load Pre-trained Model (7 pts)\
Load the pre-trained weights up to the second last layer, and initialize last layer from scratch (the very last layer that outputs the classes).

The model loading mechanism is based on names of the weights. It is easy to load pretrained models from `torchvision.models`, even when your model uses different names for weights. Please briefly explain how to load the weights correctly if the names do not match ([hint](https://discuss.pytorch.org/t/loading-weights-from-pretrained-model-with-different-module-names/11841)).

**YOUR ANSWER HERE**

If the names of the weights do not match, then we can load the pre-trained model state dictionary and extract the key-value pairs.
The name for weight from our model can be assigned to the corresponding weights from the loaded pre trained model by modfiying the key-valu pairs. Then the updated state_dict should be loaded back into the model. 

In [1]:
import torch
import torch.nn as nn
import torch.nn.functional as F
from torchvision import models
import matplotlib.pyplot as plt
%matplotlib inline

import trainer
from utils import ARGS
from simple_cnn import SimpleCNN
from voc_dataset import VOCDataset


# Pre-trained weights up to second-to-last layer
# final layers should be initialized from scratch!
class PretrainedResNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.ResNet = models.resnet18(pretrained=True)
        in_features = self.ResNet.fc.in_features
        self.ResNet.fc = nn.Sequential(nn.Linear(in_features, 20))
    
    def forward(self, x):
        x = self.ResNet(x)
        return x

Train the model with a similar hyperparameter setup as in the scratch case. No need to freeze the loaded weights. Show the learning curves (training loss, testing MAP) for 10 epochs. Please evaluate your model to calculate the MAP on the testing dataset every 100 iterations. Also feel free to tune the hyperparameters to improve performance.

**REMEMBER TO SAVE MODEL AT END OF TRAINING**

In [5]:
args = ARGS(epochs=10, lr=0.0001, batch_size=32, test_batch_size=128, gamma=0.75, step_size=5, save_at_end=True, save_freq=-1, use_cuda=True)
model = PretrainedResNet()
print(args)
optimizer = torch.optim.Adam(model.parameters(), lr=args.lr)
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=args.step_size, gamma=args.gamma)
test_ap, test_map = trainer.train(args, model, optimizer, scheduler, model_name='pre-res')
print('test map:', test_map)

args.batch_size = 32
args.device = cuda
args.epochs = 10
args.gamma = 0.75
args.inp_size = 224
args.log_every = 100
args.lr = 0.0001
args.save_at_end = True
args.save_freq = -1
args.step_size = 5
args.test_batch_size = 128
args.val_every = 100

test map: 0.7560521199568571


**YOUR TENSORBOARD SCREENSHOTS HERE**

***Loss for training***

<img src="vlr-hw1-images/q4-loss.png"/>


***mAP for testing*** 

<img src="vlr-hw1-images/q4-map.png"/>


***Learning Rate*** 

<img src="vlr-hw1-images/q4-lr.png"/>


***Histogram layer1.1.conv1.weight***

<img src="vlr-hw1-images/q4-hist-conv1.png"/>


***Histogram layer4.0.bn2.bias*** 

<img src="vlr-hw1-images/q4-hist-bias.png"/>