[Reference](https://github.com/bentrevett/pytorch-image-classification/blob/master/4%20-%20VGG.ipynb)

Unlike other previous architectures, VGG has given very simple architecture and allows researchers to design new networks by following a general template.

Like AlexNet, VGG has two components, one is features consisting of convolutional layers, non-linearity and then a pooling layer and another is classifiers consisting of fully-connected layers. VGG is not a single model instead it consist of a number of different architecture based upon the configuration of convolutional layer.

![VGG Configuration](assets/images/vgg_convnet_architecture.png)

In [1]:
## Load necessary packages
import torch
import torch.nn as nn
import numpy as np
import matplotlib.pyplot as plt
from torchvision import datasets, transforms
import torch.nn.functional as F
from torch.utils.data.sampler import SubsetRandomSampler
import torch.optim as optim

In [33]:
## Base VGG Architecture

class VGG(nn.Module):
    def __init__(self, features, output_size):
        super().__init__()
        self.features = features
        self.avgpool = nn.AdaptiveAvgPool2d(7)
        
        self.classifier = nn.Sequential(
            nn.Linear(512 * 7 * 7, 4096),
            nn.ReLU(inplace=True),
            nn.Dropout(0.5),
            nn.Linear(4096, 4096),
            nn.ReLU(inplace=True),
            nn.Dropout(0.5),
            nn.Linear(4096, output_size)
            
            
        )
        
    def forward(self, x):
        x = self.features(x)
        x = self.avgpool(x)
        
        x = x.view(x.shape[0], -1)
        
        x = self.classifier(x)
        
        return x

Next thing we need to do is calculate the features. If we look at the configuration in above image, In VGG, the classifier portion is always same i.e It is 3 fully-connected layer and the number of filters at the last conv layer of classifier is 512. The features in VGG consists of 5 blocks.

Typically the VGG Configuration are defined as lists. The integer in the configuration denotes the number of filter and char 'M denotes the max pool layer.

In [34]:
vgg16_config = [64, 64, "M", 128, 128, "M", 256, 256, 
                "M", 512, 512, 512, "M", 512, 512]

Now let's define a function that takes vgg configuration as input and returns a nn.Sequential with respect to layers.

We can see that we alywas use the same size filter (2x2) and stride(2) in all of our max pool layer i.e We will not overlap in max-pool.

For convolutional layers we alywas use the same filter size of 3x3 and padding of 1.

In [35]:
def get_vgg_layers(config):
    
    layers = list()
    in_channels = 3
    
    for c in config:
        
        assert c == "M" or isinstance(c, int)
        
        if c == "M":
            layers = layers + [nn.MaxPool2d(kernel_size=2, stride=2)]
        else:
            conv2d = nn.Conv2d(in_channels, c, 3, 1)
            layers = layers + [conv2d, nn.ReLU(inplace=True)]
            in_channels = c
    return nn.Sequential(*layers)

In [36]:
vgg_features = get_vgg_layers(vgg16_config)
vgg_features

Sequential(
  (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1))
  (1): ReLU(inplace=True)
  (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1))
  (3): ReLU(inplace=True)
  (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1))
  (6): ReLU(inplace=True)
  (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1))
  (8): ReLU(inplace=True)
  (9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1))
  (11): ReLU(inplace=True)
  (12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1))
  (13): ReLU(inplace=True)
  (14): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (15): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1))
  (16): ReLU(inplace=True)
  (17): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1))
  (18): ReLU(inplace=True)
  (19): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1))

In [38]:
output_size = 10 #
vgg16 = VGG(features=vgg_features, output_size=output_size)
vgg16

VGG(
  (features): Sequential(
    (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1))
    (1): ReLU(inplace=True)
    (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1))
    (3): ReLU(inplace=True)
    (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1))
    (6): ReLU(inplace=True)
    (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1))
    (8): ReLU(inplace=True)
    (9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1))
    (11): ReLU(inplace=True)
    (12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1))
    (13): ReLU(inplace=True)
    (14): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (15): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1))
    (16): ReLU(inplace=True)
    (17): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1))
    (18): ReLU(inplace=True)
  