<a href="https://colab.research.google.com/github/vinodnbhat/AIML_CEP_2021/blob/main/AlexNet_Inception_Net_TA_session_Nov04.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
#Imort the required libraries
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import torch.utils.data as data
import torch.optim.lr_scheduler as lr_scheduler
import torchvision.transforms as transforms
import torchvision.datasets as datasets
import matplotlib.pyplot as plt
import numpy as np
import random
import time

In [None]:
SEED = 1234
random.seed(SEED)
np.random.seed(SEED)
torch.manual_seed(SEED)
torch.cuda.manual_seed(SEED)

##CIFAR-10 dataset:

The dataset consists of 60,000 32x32 color images in 10 different classes with each class having 6,000 images. There are 50,000 train images and 10,000 test images. The classes are:
* Airplanes
* Cars
* Birds
* Cats
* Deer
* Dogs
* Frogs
* Horses
* Ships
* Trucks



In [None]:
ROOT = '.data'


#downloading cifar10 dataset from torchvision.datasets
train_data = datasets.CIFAR10(root = ROOT, 
                             train = True, 
                             download = True)

mean = train_data.data.mean() / 255
std = train_data.data.std() / 255

print(f'Calculated mean: {mean}')
print(f'Calculated std: {std}')


Files already downloaded and verified
Calculated mean: 0.4733630004850899
Calculated std: 0.2515689250632208


In [None]:
print(train_data.data.shape)

(50000, 32, 32, 3)


In [None]:
train_transforms = transforms.Compose([
                            transforms.ToTensor(),
                            transforms.Normalize(mean = [mean], std = [std])
                                      ])

test_transforms = transforms.Compose([
                           transforms.ToTensor(),
                           transforms.Normalize(mean = [mean], std = [std])
                                     ])

In [None]:
train_set = datasets.CIFAR10(root = ROOT, 
                            train = True, 
                            download = True, 
                            transform = train_transforms)

test_set = datasets.CIFAR10(root = ROOT, 
                           train = False, 
                           download = True, 
                           transform = test_transforms)

Files already downloaded and verified
Files already downloaded and verified


In [None]:
print(train_set.data.shape)

(50000, 32, 32, 3)


In [None]:
print(test_set.data.shape)

(10000, 32, 32, 3)


In [None]:
print(f'Number of training examples: {len(train_set)}')
print(f'Number of testing examples: {len(test_set)}')

Number of training examples: 50000
Number of testing examples: 10000


In [None]:
batch_size = 64

#iterators for shuffling and loading data in batches 
train_loader = data.DataLoader(train_set, 
                                 shuffle = True, 
                                 batch_size = batch_size)

test_loader = data.DataLoader(test_set, 
                                batch_size = batch_size)

In [None]:
# Checking the batch dimensions 
for images, labels in train_loader:  
    print('Image batch dimensions:', images.shape)
    print('Image label dimensions:', labels.shape)
    break

Image batch dimensions: torch.Size([64, 3, 32, 32])
Image label dimensions: torch.Size([64])


In [None]:
### Model settings###

# Hyperparameters
learning_rate = 0.01

# Architecture
num_classes = 10


#AlexNet

In 2012, AlexNet significantly outperformed all the prior competitors and won the [ImageNet](https://image-net.org/) challenge by reducing the top-5 error from 26% to 15.3%. The second place top-5 error rate, which was not a CNN variation, was around 26.2%.

It consisted 11x11, 5x5,3x3, convolutions, max pooling, dropout, data augmentation, ReLU activations, SGD with momentum. It attached ReLU activations after every convolutional and fully-connected layer.


<img src="https://miro.medium.com/max/932/1*wzflNwJw9QkjWWvTosXhNw.png" alt="AlexNet Architecture" width="1024" height="515">

Image Credit- [Medium Article](https://medium.com/coinmonks/paper-review-of-alexnet-caffenet-winner-in-ilsvrc-2012-image-classification-b93598314160)

AlexNet Implementation: [https://github.com/pytorch/vision/blob/main/torchvision/models/alexnet.py](https://github.com/pytorch/vision/blob/main/torchvision/models/alexnet.py)


In [None]:
class AlexNet(nn.Module):
    def __init__(self, num_classes= 10):
        super(AlexNet, self).__init__()
        self.features = nn.Sequential(
            nn.Conv2d(3, 64, kernel_size=3, stride=2, padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2),
            nn.Conv2d(64, 192, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2),
            nn.Conv2d(192, 384, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(384, 256, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(256, 256, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2),
        )
        self.classifier = nn.Sequential(
            nn.Dropout(),
            nn.Linear(256 * 2 * 2, 4096),
            nn.ReLU(inplace=True),
            nn.Dropout(),
            nn.Linear(4096, 4096),
            nn.ReLU(inplace=True),
            nn.Linear(4096, num_classes),
        )

    def forward(self, x):
        x = self.features(x)
        x = x.view(x.size(0), 256 * 2 * 2)
        x = self.classifier(x)
        logits = x
        probas = F.log_softmax(x, dim=1)  
        return logits, probas

In [None]:
# Initialize the model
model = AlexNet()

In [None]:
def count_parameters(model):
    return sum(p.numel() for p in model.parameters() if p.requires_grad)

print(f'The model has {count_parameters(model):,} trainable parameters')

The model has 23,272,266 trainable parameters


In [None]:
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

In [None]:
lossfn = nn.CrossEntropyLoss()

In [None]:
if torch.cuda.is_available():
  print('cuda available! using cuda..')
else:
  print('cuda not available! using cpu..')    

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

cuda available! using cuda..


In [None]:
model = model.to(device)
lossfn = lossfn.to(device)

In [None]:
def epoch_time(start_time, end_time):
    elapsed_time = end_time - start_time
    elapsed_mins = int(elapsed_time / 60)
    elapsed_secs = int(elapsed_time - (elapsed_mins * 60))
    return elapsed_mins, elapsed_secs

In [None]:
# Compute the Accuracy
def compute_accuracy(model, data_loader):
    correct_pred, num_examples = 0, 0
    for i, (features, targets) in enumerate(data_loader):            
        features = features.to(device)
        targets = targets.to(device)
        logits, probas = model(features)
        _, predicted_labels = torch.max(probas, 1)
        num_examples += targets.size(0)
        correct_pred += (predicted_labels == targets).sum()
    return correct_pred.float()/num_examples * 100

In [None]:
def train(model, iterator, optimizer, criterion, device):
    
    epoch_loss = 0
    
    model.train()
    
    for (x, y) in iterator:
        
        x = x.to(device)
        y = y.to(device)
        
        optimizer.zero_grad()
                
        y_pred_logits, y_pred_probas = model(x)
        
        loss = lossfn(y_pred_logits, y)
        
        
        loss.backward()
        
        optimizer.step()
        
        epoch_loss += loss.item()
    
    return epoch_loss / len(iterator)

In [None]:
save_model = False
patience_early_stopping = 3  #training will stop if model performance does not improve for these many consecutive epochs
cnt = 0 #counter for checking patience level
EPOCHS = 100
prev_test_acc = 0 #initializing prev test accuracy for early stopping condition
scheduler = lr_scheduler.ReduceLROnPlateau(optimizer, mode = 'max', factor = 0.2, patience = 1) #learning rate scheduler, update learning rate by 
#factor of 0.2 if test accuracy does not improve for patience+1 consecutive epochs
for epoch in range(EPOCHS):
    print("current learning rate", optimizer.state_dict()['param_groups'][0]['lr'])
    start_time = time.perf_counter()
    
    train_loss = train(model, train_loader, optimizer, lossfn, device)
    train_acc = compute_accuracy(model, train_loader)

    if save_model:
        torch.save(model.state_dict(), 'alexnet_model.pt')
    
    if epoch%1==0: #for every epoch we shall compute the test accuracy
        test_acc = compute_accuracy(model, test_loader)
        
        if test_acc > prev_test_acc: #check if test accuracy for current epoch has improved compared to previous epoch
          cnt = 0                    #f accuracy improves reset counter to 0

        else:                        #otherwise increment current counter
          cnt += 1

        prev_test_acc = test_acc

    
    scheduler.step(test_acc) #updates learning rate
    
        
    end_time = time.perf_counter()

    epoch_mins, epoch_secs = epoch_time(start_time, end_time)
    
    print(f'Epoch: {epoch+1:2} | Epoch Time: {epoch_mins}m {epoch_secs}s')
    print(f'\tTrain Loss: {train_loss:.3f} | Train Acc: {train_acc:.2f}%')
    if epoch%1==0: #for every epoch we shall print the test loss and test accuracy 
        print(f'\t test Acc: {test_acc:.2f}% \n')

    if cnt == patience_early_stopping:
      print(f"early stopping as test accuracy did not improve for {patience_early_stopping} consecutive epochs")
      break

current learning rate 0.01
Epoch:  1 | Epoch Time: 0m 56s
	Train Loss: 26.794 | Train Acc: 10.00%
	 test Acc: 10.00% 

current learning rate 0.01
Epoch:  2 | Epoch Time: 0m 56s
	Train Loss: 2.304 | Train Acc: 10.00%
	 test Acc: 10.00% 

current learning rate 0.01
Epoch:  3 | Epoch Time: 0m 56s
	Train Loss: 2.304 | Train Acc: 10.00%
	 test Acc: 10.00% 

current learning rate 0.002


KeyboardInterrupt: ignored

#GoogLeNet (Inception)

GoogLeNet won ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC) which uses the proposed inception architecture. An inception architecture is given below.

<center><img src="https://images.deepai.org/django-summernote/2019-06-18/5ebad056-29d3-4f4c-bef1-2f262388afb0.png" alt="Inception Module1" width="600" height="515"></center>

<center><img src="https://images.deepai.org/django-summernote/2019-06-18/2cec735b-2347-4ded-ae2b-e8a8384f7b46.png" alt="Inception Module2" width="600" height="515"><img src="https://miro.medium.com/max/700/0*rbWRzjKvoGt9W3Mf.png" alt="GoogLeNet Architecture" width="1" height="515" ></center>


An Inception module applies four convolution blocks separately on the same feature map: a 1x1, 3x3, and 5x5 convolution, and a max pool operation. This allows the network to look at the same data with different receptive fields.

<center><img src="https://miro.medium.com/max/1400/0*q5eMDjUHKqEyo7qY.png" alt="GoogLeNet Architecture" width="1024" height="515" ></center>

(Image taken from [AI in Plain English article](https://ai.plainenglish.io/googlenet-inceptionv1-with-tensorflow-9e7f3a161e87) and [DeepAI](https://deepai.org/machine-learning-glossary-and-terms/inception-module))

In GoogLeNet, consists of stacking multiple Inception modules with occasional max pooling to reduce the height and width of the feature maps. The original GoogleNet was designed for image sizes of ImageNet (224x224 pixels) and had almost 7 million parameters. 

Auxiliary classifiers are a type of architectural component that seek to improve the convergence of very deep networks. They are classifier heads we attach to layers before the end of the network. The motivation is to push useful gradients to the lower layers to make them immediately useful and improve the convergence during training by combatting the vanishing gradient problem. GoogLeNet uses 2 auxiliary classifiers with discounted weights (losses are weighted by 0.3).

GoogLeNet Implementation: [https://github.com/pytorch/vision/blob/main/torchvision/models/googlenet.py](https://github.com/pytorch/vision/blob/main/torchvision/models/googlenet.py)