# Training SimpleNN on CIFAR-10
In this project, you will use the SimpleNN model to perform image classification on CIFAR-10. CIFAR-10 orginally contains 60K images from 10 categories. We split it into 45K/5K/10K images to serve as train/valiation/test set. We only release the ground-truth labels of training/validation dataset to you.

## Step 0: Set up the SimpleNN model
As you have practiced to implement simple neural networks in Homework 1, we just prepare the implementation for you.

In [1]:
# import necessary dependencies
import argparse
import os, sys
import time
import datetime
from tqdm import tqdm_notebook as tqdm

import torch
import torch.nn as nn
import torch.nn.functional as F

In [2]:
class Teacher(nn.Module):
    def __init__(self):
        super(Teacher, self).__init__()
        self.layers=nn.Sequential(nn.Flatten(),
                                    #nn.Dropout(0.2),
                                    nn.Linear(28*28,1200),
                                    #n.Dropout(0.5),
                                    nn.ReLU(),
                                    nn.Linear(1200,10))
    def forward(self,x):
        return self.layers(x)


### Question (a)
Here is a sanity check to verify the implementation of SimpleNN. 
You need to:
1. Write down your code.
2. **In the PDF report**, give a brief description on how the code helps you know that SimpleNN is implemented correctly.

In [3]:
# useful libraries
import torchvision
import torchvision.transforms as transforms

#############################################
# your code here
# specify preprocessing function
transform = transforms.Compose(
    
    (transforms.ToTensor(),)
    
)
transform_train = transforms.Compose(
    (
    
    transforms.RandomCrop((28,28),padding=2),
    #transforms.RandomHorizontalFlip(),
    #transforms.RandomVerticalFlip(),
    transforms.ToTensor(),
    #transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
    
    #
    #transforms.ColorJitter(0.2,0,0)
    
    )
)

transform_val = transform
#############################################

## Step 2: Set up dataset and dataloader

### Question (c)
Set up the train/val datasets and dataloders that are to be used during the training. Check out the [official API](https://pytorch.org/docs/stable/data.html) for more information about **torch.utils.data.DataLoader**.

Here, you need to:
1. Complete the code below.

In [4]:
# do NOT change these
from torchvision.datasets import MNIST
from torch.utils.data import DataLoader

# a few arguments, do NOT change these
DATA_ROOT = "./data"
TRAIN_BATCH_SIZE = 128
VAL_BATCH_SIZE = 100

#############################################
# your code here
# construct dataset
train_set = MNIST(
    root=DATA_ROOT, 
    train=True, 
    download=True,
    transform=transform_train    # your code
)

val_set = MNIST(
    root=DATA_ROOT, 
    train=False, 
    download=True,
    transform=transform_val    # your code
)

# construct dataloader
train_loader = DataLoader(
    train_set, 
    batch_size=TRAIN_BATCH_SIZE,  # your code
    shuffle=True,     # your code
    num_workers=2
)

val_loader = DataLoader(
    val_set, 
    batch_size=VAL_BATCH_SIZE,  # your code
    shuffle=False,     # your code
    num_workers=2
)
#############################################

## Step 3: Instantiate your SimpleNN model and deploy it to GPU devices.
### Question (d)
You may want to deploy your model to GPU device for efficient training. Please assign your model to GPU if possible. If you are training on a machine without GPUs, please deploy your model to CPUs.

Here, you need to:
1. Complete the code below.
2. **In the PDF report**, briefly describe how you verify that your model is indeed deployed on GPU. (Hint: check $\texttt{nvidia-smi}$.)

In [14]:
# specify the device for computation
#############################################
# your code here
device = 'cuda' if torch.cuda.is_available() else 'cpu'
model=Teacher()
model=model.to(device)
#############################################

## Step 4: Set up the loss function and optimizer
Loss function/objective function is used to provide "feedback" for the neural networks. Typically, we use multi-class cross-entropy as the loss function for classification models. As for the optimizer, we will use SGD with momentum. 

### Question (e)
Here, you need to:
1. Set up the cross-entropy loss as the criterion. (Hint: there are implemented functions in **torch.nn**)
2. Specify a SGD optimizer with momentum. (Hint: there are implemented functions in **torch.optim**)

In [15]:
import torch.nn as nn
import torch.optim as optim

# hyperparameters, do NOT change right now
# initial learning rate
INITIAL_LR = 0.10114514

# momentum for optimizer
MOMENTUM = 0.9

# L2 regularization strength
REG = 0.001

#############################################
# your code here
# create loss function
criterion = nn.CrossEntropyLoss()

# Add optimizer
optimizer = optim.SGD(model.parameters(),lr=INITIAL_LR,momentum=MOMENTUM,weight_decay=REG,nesterov=True)
#############################################

## Step 5: Start the training process.

### Question (f)/(g)
Congratulations! You have completed all of the previous steps and it is time to train our neural network.

Here you need to:
1. Complete the training codes.
2. Actually perform the training.

Hint: Training a neural network usually repeats the following 4 steps: 

**i) Get a batch of data from the dataloader and copy it to your device (GPU).**

**ii) Do a forward pass to get the outputs from the neural network and compute the loss. Be careful about your inputs to the loss function. Are the inputs required to be the logits or softmax probabilities?)**

**iii) Do a backward pass (back-propagation) to compute gradients of all weights with respect to the loss.**

**iiii) Update the model weights with the optimizer.**

You will also need to compute the accuracy of training/validation samples to track your model's performance over each epoch (the accuracy should be increasing as you train for more and more epochs).


In [16]:
# some hyperparameters
# total number of training epochs
EPOCHS = 150

# the folder where the trained model is saved
CHECKPOINT_FOLDER = "./saved_model"
DECAY_EPOCHS=1
DECAY=0.96
# start the training/validation process
# the process should take about 5 minutes on a GTX 1070-Ti
# if the code is written efficiently.
best_val_acc = 0
current_learning_rate = INITIAL_LR

print("==> Training starts!")
print("="*50)
for i in range(0, EPOCHS):
    # handle the learning rate scheduler.
    
    if i % DECAY_EPOCHS == 0 and i != 0 :
        current_learning_rate = current_learning_rate * DECAY
    
        for param_group in optimizer.param_groups:
            param_group['lr'] = current_learning_rate
        print("Current learning rate has decayed to %f" %current_learning_rate)
    
    #######################
    # your code here
    # switch to train mode
    model.train()
    
    #######################
    
    print("Epoch %d:" %i)
    # this help you compute the training accuracy
    total_examples = 0
    correct_examples = 0

    train_loss = 0 # track training loss if you want
    loader=train_loader
    
    # Train the model for 1 epoch.
    for batch_idx, (inputs, targets) in enumerate(loader):
        ####################################
        # your code here
        # copy inputs to device
        inputs=inputs.to(device)
        targets=targets.to(device).long()

        
        # compute the output and loss
        out=model(inputs)
        loss=criterion(out,targets)
        
        # zero the gradient
        
        optimizer.zero_grad()
        # backpropagation
        loss.backward()

        
        # apply gradient and update the weights
        optimizer.step()
        train_loss+=loss.item()
        
        # count the number of correctly predicted samples in the current batch
        correct_examples+=torch.sum(out.argmax(-1)==targets).item()
        ####################################
    total_examples=len(train_loader.dataset)      
    avg_loss = train_loss / len(train_loader)
    avg_acc = correct_examples / total_examples
    print("Training loss: %.4f, Training accuracy: %.4f" %(avg_loss, avg_acc))

    # Validate on the validation dataset
    #######################
    # your code here
    # switch to eval mode
    model.eval()
    
    #######################

    # this help you compute the validation accuracy
    total_examples = 0
    correct_examples = 0
    
    val_loss = 0 # again, track the validation loss if you want

    # disable gradient during validation, which can save GPU memory
    with torch.no_grad():
        for batch_idx, (inputs, targets) in enumerate(val_loader):
            ####################################
            # your code here
            # copy inputs to device
            inputs=inputs.to(device)
            targets=targets.to(device).long()
            # compute the output and loss
            out=model(inputs)
            loss=criterion(out,targets)
            # count the number of correctly predicted samples in the current batch
            val_loss+=loss.item()
            correct_examples+=torch.sum(out.argmax(-1)==targets).item()
            
            ####################################
    total_examples=len(val_loader.dataset)
    avg_loss = val_loss / len(val_loader)
    avg_acc = correct_examples / total_examples
    print("Validation loss: %.4f, Validation accuracy: %.4f" % (avg_loss, avg_acc))
    
    # save the model checkpoint
    if avg_acc > best_val_acc:
        best_val_acc = avg_acc
        if not os.path.exists(CHECKPOINT_FOLDER):
            os.makedirs(CHECKPOINT_FOLDER)
        print("Saving ...")
        state = {'state_dict': model.state_dict(),
                 'epoch': i,
                }
        torch.save(state, os.path.join(CHECKPOINT_FOLDER, 'mnist_teacher.pth'))
        
    print('')

print("="*50)
print(f"==> Optimization finished! Best validation accuracy: {best_val_acc:.4f}")

==> Training starts!
Epoch 0:


KeyboardInterrupt: 

In [5]:
def train(REG,name):
    # specify the device for computation
    #############################################
    # your code here
    device = 'cuda' if torch.cuda.is_available() else 'cpu'
    model=Teacher()
    model=model.to(device)
    #############################################
    import torch.nn as nn
    import torch.optim as optim

    # hyperparameters, do NOT change right now
    # initial learning rate
    INITIAL_LR = 0.10114514

    # momentum for optimizer
    MOMENTUM = 0.9

    # L2 regularization strength
    

    #############################################
    # your code here
    # create loss function
    criterion = nn.CrossEntropyLoss()

    # Add optimizer
    optimizer = optim.SGD(model.parameters(),lr=INITIAL_LR,momentum=MOMENTUM,weight_decay=REG,nesterov=True)
    #############################################
    # some hyperparameters
    # total number of training epochs
    EPOCHS = 140

    # the folder where the trained model is saved
    CHECKPOINT_FOLDER = "./tmp_model"
    DECAY_EPOCHS=1
    DECAY=0.96
    # start the training/validation process
    # the process should take about 5 minutes on a GTX 1070-Ti
    # if the code is written efficiently.
    best_val_acc = 0
    current_learning_rate = INITIAL_LR

    print("==> Training starts!")
    print("="*50)
    for i in range(0, EPOCHS):
        # handle the learning rate scheduler.
        
        if i % DECAY_EPOCHS == 0 and i != 0 :
            current_learning_rate = current_learning_rate * DECAY
        
            for param_group in optimizer.param_groups:
                param_group['lr'] = current_learning_rate
            #print("Current learning rate has decayed to %f" %current_learning_rate)
        
        #######################
        # your code here
        # switch to train mode
        model.train()
        
        #######################
        
        #print("Epoch %d:" %i)
        # this help you compute the training accuracy
        total_examples = 0
        correct_examples = 0

        train_loss = 0 # track training loss if you want
        loader=train_loader
        
        # Train the model for 1 epoch.
        for batch_idx, (inputs, targets) in enumerate(loader):
            ####################################
            # your code here
            # copy inputs to device
            inputs=inputs.to(device)
            targets=targets.to(device).long()

            
            # compute the output and loss
            out=model(inputs)
            loss=criterion(out,targets)
            
            # zero the gradient
            
            optimizer.zero_grad()
            # backpropagation
            loss.backward()

            
            # apply gradient and update the weights
            optimizer.step()
            train_loss+=loss.item()
            
            # count the number of correctly predicted samples in the current batch
            correct_examples+=torch.sum(out.argmax(-1)==targets).item()
            ####################################
        total_examples=len(train_loader.dataset)      
        avg_loss = train_loss / len(train_loader)
        avg_acc = correct_examples / total_examples
        #print("Training loss: %.4f, Training accuracy: %.4f" %(avg_loss, avg_acc))

        # Validate on the validation dataset
        #######################
        # your code here
        # switch to eval mode
        model.eval()
        
        #######################

        # this help you compute the validation accuracy
        total_examples = 0
        correct_examples = 0
        
        val_loss = 0 # again, track the validation loss if you want

        # disable gradient during validation, which can save GPU memory
        with torch.no_grad():
            for batch_idx, (inputs, targets) in enumerate(val_loader):
                ####################################
                # your code here
                # copy inputs to device
                inputs=inputs.to(device)
                targets=targets.to(device).long()
                # compute the output and loss
                out=model(inputs)
                loss=criterion(out,targets)
                # count the number of correctly predicted samples in the current batch
                val_loss+=loss.item()
                correct_examples+=torch.sum(out.argmax(-1)==targets).item()
                
                ####################################
        total_examples=len(val_loader.dataset)
        avg_loss = val_loss / len(val_loader)
        avg_acc = correct_examples / total_examples
        #print("Validation loss: %.4f, Validation accuracy: %.4f" % (avg_loss, avg_acc))
        
        # save the model checkpoint
        if avg_acc > best_val_acc:
            best_val_acc = avg_acc
            if not os.path.exists(CHECKPOINT_FOLDER):
                os.makedirs(CHECKPOINT_FOLDER)
            print("Saving ...",avg_acc)
            state = {'state_dict': model.state_dict(),
                    'epoch': i,
                    'accuraccy': avg_acc
                    }
            torch.save(state, os.path.join(CHECKPOINT_FOLDER, name))
            
        #print('')

    print("="*50)
    print(f"==> Optimization finished! Best validation accuracy: {best_val_acc:.4f}")

In [6]:
regs=torch.linspace(0,7.3e-5,20)
for reg in regs:
    train(reg,"teacher"+str(reg)+".pth")
    print(reg)

==> Training starts!


KeyboardInterrupt: 

In [7]:
regs=[1e-3,1.2e-5,1e-5,8e-6,1.5e-05,1.6e-5,1.4e-5,1.3e-5,1.1e-5]
for reg in regs:
    train(reg,"teacher"+str(reg)+".pth")
    print(reg)

==> Training starts!
Saving ... 0.9658
Saving ... 0.9727
Saving ... 0.9742
Saving ... 0.9769
Saving ... 0.9772
Saving ... 0.9787
Saving ... 0.9799
Saving ... 0.9809
Saving ... 0.981
Saving ... 0.9836
Saving ... 0.984
Saving ... 0.9841
Saving ... 0.9845
Saving ... 0.9847
Saving ... 0.9849
Saving ... 0.985
Saving ... 0.9853
Saving ... 0.9854
Saving ... 0.9857
Saving ... 0.9858
==> Optimization finished! Best validation accuracy: 0.9858
0.001
==> Training starts!
Saving ... 0.9633
Saving ... 0.9711
Saving ... 0.9754
Saving ... 0.9801
Saving ... 0.9802
Saving ... 0.9831
Saving ... 0.985
Saving ... 0.9852
Saving ... 0.9873
Saving ... 0.9882
Saving ... 0.99
Saving ... 0.9903
Saving ... 0.9915
Saving ... 0.9916
Saving ... 0.9917
Saving ... 0.9918
Saving ... 0.992
Saving ... 0.9921
Saving ... 0.9923
==> Optimization finished! Best validation accuracy: 0.9923
1.2e-05
==> Training starts!
Saving ... 0.9653
Saving ... 0.9717
Saving ... 0.9755
Saving ... 0.9793
Saving ... 0.9822
Saving ... 0.9827
