Training CNNs
================
------
**Deep Learning for Computer Vision**<br>
(c) Research Group CAMMA, University of Strasbourg<br>
Website: http://camma.u-strasbg.fr/
-----

In this lab session we will go over the detailed implementation of a training loop for a CNN model.

Those *highly repetitive* exercises are meant to train your fundamentals, without the comfort of high-level APIs. 

You will have to identify overfitting scenarios and adjust your training process accordingly using the methods presented during the lecture.

**Instructions**

Import RPS dataset (https://seafile.unistra.fr/f/2d58c54203e6435fbf22/?dl=1) in your google drive 'datasets' folder.

# GPU activation

Be sure to have cuda enabled from your computer.

# Imports

In [1]:
import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader, Subset
import torchvision
from torchvision.transforms import ToTensor, ToPILImage, Resize


import matplotlib.pyplot as plt
import importlib as ipl
import numpy as np
import random
import pickle
import os
import urllib
from timeit import default_timer as timer
import gc
from zipfile import ZipFile

# check the PyTorch version; 
print("PyTorch version: ", torch.__version__) 
# check the GPU support; shold be yes
print("Is GPU available?: ", torch.cuda.is_available())

PyTorch version:  1.13.1+cpu
Is GPU available?:  False


The next cells will prepare the dataset you will be working on.

In [None]:
from google.colab import drive
drive.mount('/content/drive')
filepath='rps'
with ZipFile('/content/drive/MyDrive/datasets/RPS.zip', 'r') as zf:
  zf.extractall('RPS/')
  print("Files extracted and folder {} created.".format(filepath))

# Dataset

We will be packaging the data into the following PyTorch datasets with a batch size of 16 for speed and convenience.

In [None]:
transform = torchvision.transforms.Compose([
    ToTensor(),
    Resize(size=(224,224)),
])

full_dataset = torchvision.datasets.ImageFolder(root=filepath, transform=transform)
class_names = full_dataset.classes
print(class_names)

items = np.random.permutation(len(full_dataset))
val_ratio = 0.1
test_ratio = 0.1
train_items = items[0:int((1.0-val_ratio-test_ratio)*len(full_dataset))]
val_items = items[int((1.0-val_ratio-test_ratio)*len(full_dataset)):int((1.0-test_ratio)*len(full_dataset))]
test_items = items[int((1.0-test_ratio)*len(full_dataset)):-1]

train_dataset = Subset(full_dataset, train_items)
val_dataset = Subset(full_dataset, val_items)
test_dataset = Subset(full_dataset, test_items)

BATCH_SIZE = 32
full_dataloader = DataLoader(train_dataset, batch_size=BATCH_SIZE, shuffle=True)

train_dataloader = DataLoader(train_dataset, batch_size=BATCH_SIZE, shuffle=True)
val_dataloader = DataLoader(val_dataset, batch_size=BATCH_SIZE, shuffle=False)
test_dataloader = DataLoader(test_dataset, batch_size=BATCH_SIZE, shuffle=False)

train_small_indices = np.random.permutation(train_items[0:int(0.05*len(train_items))])
train_small_dataset = Subset(full_dataset, train_small_indices)
train_small_dataloader = DataLoader(train_small_dataset, batch_size=BATCH_SIZE, shuffle=True)
print(len(train_small_dataset))

x, y = dataset[index] 

Labels (y_...) 0, 1, 2 correspond to paper, rock and scissors respectively.
Images (x_...) are (224, 224, 3) numpy arrays.

### TODO 1: Preview 8 images from the training set. Display the type of move (rock, paper or scissors) in the title. Any comments?

In [None]:
# Get a batch of training data
inputs, classes = next(iter(train_dataloader))

# Make a grid from batch
out = torchvision.utils.make_grid(inputs[0:8])

plt.imshow(ToPILImage()(out))
plt.title([class_names[x] for x in classes[0:8]])

# Model

The model is provided in the following cell. Pay attention to the syntax: in *__init__* we define the layers with their properties. In *forward* we establish the sequence of function calls that turns inputs (batch of images) into predictions (batch of class probabilities). Each one of the layer objects behaves like a function.

In [None]:
class CNN(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv_1 = nn.Conv2d(in_channels=3, out_channels=8, kernel_size=5)
        self.max_pool_1 = nn.MaxPool2d(kernel_size=5)
        self.conv_2 = nn.Conv2d(in_channels=8, out_channels=16, kernel_size=3)
        self._flatten = nn.Flatten()
        self._dense_1 = nn.Linear(in_features=28224, out_features=2048)
        self._dense_2 = nn.Linear(in_features=2048, out_features=4096)
        self._dense_3 = nn.Linear(in_features=4096, out_features=3)
        self._relu = nn.ReLU()
        
    def forward(self, x):
        x = self._relu(self.conv_1(x))
        x = self.max_pool_1(x)
        x = self._relu(self.conv_2(x))
        x = self._flatten(x)
        x = nn.ReLU()(self._dense_1(x))
        x = nn.ReLU()(self._dense_2(x))
        x = self._dense_3(x)
        return x

Here we create a CNN model, as well as a save for its parameters from which you'll restart throughout the lab.

In [None]:
cnn_0 = CNN()
cnn_0 = cnn_0.to('cuda:0')
inp = torch.zeros([5,3,224,224]).to('cuda:0') # create "fake" input to run cnn_0 a first time, for allocation (starting point)
print(inp.type())
sample = cnn_0(inp)
print(sample)

Here is an overview of the model's architecture:

In [None]:
from torchsummary import summary
summary(cnn_0, (3, 224, 224))    # provide "summary" with model and input data size

This - *incomplete* -  function will be used to complete one training step. For now it takes batch of images *inputs*, the corresponding batch of labels *labels*, a model *cnn_model* and returns the loss between the model's predictions and the labels.

Again, pay attention to the *cnn_model* object. It acts as a **function** here - takes the batch of images and returns the batch of predictions. The first call to this function will also create the tf.Variable objects for the model parameters.

In [None]:
def train_step(inputs, labels, cnn_model):
    cnn_model.train()
    data = inputs.to('cuda:0')
    outp = cnn_model(data)
    loss = torch.mean(torch.nn.CrossEntropyLoss()(outp.to('cpu'), labels)).item()
    del data
    del outp
    torch.cuda.empty_cache()
    return loss

Similarly, the following function computes the accuracy of the model on a batch.

In [None]:
def eval_step(inputs, labels, cnn_model):
    cnn_model.eval()
    data = inputs.to('cuda:0')
    outp = cnn_model(data)
    pred = torch.argmax(outp, axis=1)
    acc = torch.mean((labels == pred.to('cpu')).float())
    del data
    del pred
    torch.cuda.empty_cache()
    return acc

### TODO 2: Run the model on the training set for 100 iterations. Report the loss at each iteration. Plot and comment; what is missing?

In [None]:
losses = []
iteration = 0
max_iter = 100

for epoch in range(10):
    for (x_train_in, y_train_in) in train_dataloader:
        iteration += 1 
        losses.append(train_step(x_train_in, y_train_in, cnn_0))
        if iteration >= max_iter:
            break

fig = plt.figure()
ax = fig.add_subplot(111)
ax.plot(np.arange(iteration), losses)
ax.set_xlabel("iteration")
ax.set_ylabel("classification loss")

print("accuracy evaluation on last step : {}".format(eval_step(x_train_in, y_train_in, cnn_0)))
torch.cuda.empty_cache()


### TODO 3: Fix the *train_step* function to incorporate the update:


In [None]:
LEARNING_RATE = 0.005
optimizer = torch.optim.SGD(cnn_0.parameters(), lr=LEARNING_RATE)

def train_step(inputs, labels, cnn_model, optim):
    cnn_model.train()
    data = inputs.to('cuda:0')
    #TODO
    outp = cnn_model(data)
    loss = torch.mean(torch.nn.CrossEntropyLoss()(outp.to('cpu'), labels))
    loss.backward()
    optim.step()
    del data
    del outp
    torch.cuda.empty_cache()
    return loss.item()

In [None]:
torch.save({
            'model_state_dict': cnn_0.state_dict(),
            'optimizer_state_dict': optimizer.state_dict(),
            }, 'start')

### TODO 4: Train the model on the training set for 2000 iterations. Every 100th iteration, report the average loss over the 100 previous iterations. Plot it.

In [None]:
start_t = timer()

average_training_losses = []
training_losses = []
iteration = 0
nb_pts = 0 
max_iter = 2000
stop_epoch = False

for epoch in range(100): 
#TODO

end_t = timer()
print("time_elapsed: {}".format(end_t - start_t))

times = list(range(len(average_training_losses)))

fig = plt.figure()
ax = fig.add_subplot(111)
ax.plot(np.arange(nb_pts), average_training_losses)
ax.set_xlabel("iteration * 100")
ax.set_ylabel("classification loss")

### TODO 5: Evaluate (report the average accuracy) on the training set, then on the test set. Comment.

In [None]:
train_accs = []
#TODO

average_train_acc = sum(train_accs) / len(train_accs)
print("ACCURACY - TRAINING SET: {}".format(average_train_acc))

test_accs = []
#TODO

average_test_acc = sum(test_accs) / len(test_accs)
print("ACCURACY - TEST SET: {}".format(average_test_acc))

### TODO 6: Reset the model's weights, then remove 95% of the data from the training set. Repeat the training process from question 4 on this diminished dataset. Plot the loss, evaluate this model on its training set and the test set. Comment.

In [None]:
cnn_0 = CNN()
optimizer = optimizer = torch.optim.SGD(cnn_0.parameters(), lr=LEARNING_RATE)

startpoint = torch.load('start')
cnn_0.load_state_dict(startpoint['model_state_dict'])
optimizer.load_state_dict(startpoint['optimizer_state_dict'])
cnn_0 = cnn_0.to('cuda:0')

In [None]:
start_t = timer()

average_training_losses = []
training_losses = []
iteration = 0
nb_pts = 0 
max_iter = 1000
stop_epoch = False

for epoch in range(1000000): 
#TODO


end_t = timer()
print("time_elapsed: {}".format(end_t - start_t))

times = list(range(len(average_training_losses)))

fig = plt.figure()
ax = fig.add_subplot(111)
ax.plot(times, average_training_losses)
ax.set_xlabel("iteration / 100")
ax.set_ylabel("classification loss")

In [None]:
train_accs = []
#TODO

average_train_acc = sum(train_accs) / len(train_accs)
print("ACCURACY - TRAINING SET: {}".format(average_train_acc))

test_accs = []
#TODO

average_test_acc = sum(test_accs) / len(test_accs)
print("ACCURACY - TEST SET: {}".format(average_test_acc))

### TODO 7: We are going to modify the training loop to incorporate validation; every 100th iteration, evaluate the model on the entire validation dataset and report the average accuracy (do this for 0 and 95% of data removed). Plot the validation accuracy over the course of the training process and interpret.

In [None]:
cnn_0 = CNN()
optimizer = optimizer = torch.optim.SGD(cnn_0.parameters(), lr=LEARNING_RATE)

startpoint = torch.load('start')
cnn_0.load_state_dict(startpoint['model_state_dict'])
optimizer.load_state_dict(startpoint['optimizer_state_dict'])
cnn_0 = cnn_0.to('cuda:0')

In [None]:
start_t = timer()

average_training_losses = []
average_val_accs = []
training_losses = []
val_accs = []
iteration = 0
nb_pts = 0 
max_iter = 2000
stop_epoch = False

for epoch in range(100): 
#TODO
        
end_t = timer()
print("time_elapsed: {}".format(end_t - start_t))

times = list(range(len(average_training_losses)))

fig = plt.figure()

ax = fig.add_subplot(121)
ax.plot(np.arange(nb_pts), average_training_losses)
ax.set_xlabel("iteration * 100")
ax.set_ylabel("classification loss")

ax = fig.add_subplot(122)
ax.plot(np.arange(len(average_val_accs)), average_val_accs)
ax.set_xlabel("iteration * 100")
ax.set_ylabel("validation accuracy")

In [None]:
train_accs = []
#TODO

average_train_acc = sum(train_accs) / len(train_accs)
print("ACCURACY - TRAINING SET: {}".format(average_train_acc))

test_accs = []
#TODO

average_test_acc = sum(test_accs) / len(test_accs)
print("ACCURACY - TEST SET: {}".format(average_test_acc))

In [None]:
cnn_0 = CNN()
optimizer = optimizer = torch.optim.SGD(cnn_0.parameters(), lr=LEARNING_RATE)

startpoint = torch.load('start')
cnn_0.load_state_dict(startpoint['model_state_dict'])
optimizer.load_state_dict(startpoint['optimizer_state_dict'])
cnn_0 = cnn_0.to('cuda:0')

In [None]:
start_t = timer()

average_training_losses = []
average_val_accs = []
training_losses = []
val_accs = []
iteration = 0
nb_pts = 0 
max_iter = 2000
stop_epoch = False

for epoch in range(1000000): 
#TODO
        
end_t = timer()
print("time_elapsed: {}".format(end_t - start_t))

times = list(range(len(average_training_losses)))

fig = plt.figure()

ax = fig.add_subplot(121)
ax.plot(np.arange(nb_pts), average_training_losses)
ax.set_xlabel("iteration * 100")
ax.set_ylabel("classification loss")

ax = fig.add_subplot(122)
ax.plot(np.arange(len(average_val_accs)), average_val_accs)
ax.set_xlabel("iteration * 100")
ax.set_ylabel("validation accuracy")

In [None]:
train_accs = []
#TODO

average_train_acc = sum(train_accs) / len(train_accs)
print("ACCURACY - TRAINING SET: {}".format(average_train_acc))

test_accs = []
#TODO

average_test_acc = sum(test_accs) / len(test_accs)
print("ACCURACY - TEST SET: {}".format(average_test_acc))

### TODO 8: We will now incorporate data augmentation into the training set. Choose random transformations in order to return the dataset with randomly transformed images. 

In [None]:
cnn_0 = CNN()
optimizer = optimizer = torch.optim.SGD(cnn_0.parameters(), lr=LEARNING_RATE)

startpoint = torch.load('start')
cnn_0.load_state_dict(startpoint['model_state_dict'])
optimizer.load_state_dict(startpoint['optimizer_state_dict'])
cnn_0 = cnn_0.to('cuda:0')

In [None]:
#TODO

full_dataset = torchvision.datasets.ImageFolder(root='RPS', transform=transform_rand)

items = np.random.permutation(len(full_dataset))
val_ratio = 0.1
test_ratio = 0.1
train_items = items[0:int((1.0-val_ratio-test_ratio)*len(full_dataset))]
val_items = items[int((1.0-val_ratio-test_ratio)*len(full_dataset)):int((1.0-test_ratio)*len(full_dataset))]
test_items = items[int((1.0-test_ratio)*len(full_dataset)):-1]

train_dataset = Subset(full_dataset, train_items)
val_dataset = Subset(full_dataset, val_items)
test_dataset = Subset(full_dataset, test_items)

BATCH_SIZE = 32
full_dataloader = DataLoader(train_dataset, batch_size=BATCH_SIZE, shuffle=True)

train_dataloader = DataLoader(train_dataset, batch_size=BATCH_SIZE, shuffle=True)
val_dataloader = DataLoader(val_dataset, batch_size=BATCH_SIZE, shuffle=True)
test_dataloader = DataLoader(test_dataset, batch_size=BATCH_SIZE, shuffle=False)

In [None]:
start_t = timer()

average_training_losses = []
average_val_accs = []
training_losses = []
val_accs = []
iteration = 0
nb_pts = 0 
max_iter = 2000
stop_epoch = False

for epoch in range(1000000): 
#TODO
        
end_t = timer()
print("time_elapsed: {}".format(end_t - start_t))

times = list(range(len(average_training_losses)))

fig = plt.figure()

ax = fig.add_subplot(121)
ax.plot(np.arange(nb_pts), average_training_losses)
ax.set_xlabel("iteration * 100")
ax.set_ylabel("classification loss")

ax = fig.add_subplot(122)
ax.plot(np.arange(len(average_val_accs)), average_val_accs)
ax.set_xlabel("iteration * 100")
ax.set_ylabel("validation accuracy")

In [None]:
train_accs = []
#TODO
average_train_acc = sum(train_accs) / len(train_accs)
print("ACCURACY - TRAINING SET: {}".format(average_train_acc))

test_accs = []
#TODO

average_test_acc = sum(test_accs) / len(test_accs)
print("ACCURACY - TEST SET: {}".format(average_test_acc))

### TODO 9: We will attempt to use l2 regularization to alleviate the overfitting issue. Use the "weight_decay" in your optimizer. 

Train on the small training set; plot loss and validation accuracy. 

In [None]:
cnn_0 = CNN()
optimizer = optimizer = torch.optim.SGD(cnn_0.parameters(), lr=LEARNING_RATE, weight_decay=0.01)

startpoint = torch.load('start')
cnn_0.load_state_dict(startpoint['model_state_dict'])
optimizer.load_state_dict(startpoint['optimizer_state_dict'])
cnn_0 = cnn_0.to('cuda:0')

In [None]:
start_t = timer()

average_training_losses = []
average_val_accs = []
training_losses = []
val_accs = []
iteration = 0
nb_pts = 0 
max_iter = 2000
stop_epoch = False

for epoch in range(1000000): 
#TODO
        
end_t = timer()
print("time_elapsed: {}".format(end_t - start_t))

times = list(range(len(average_training_losses)))

fig = plt.figure()

ax = fig.add_subplot(121)
ax.plot(np.arange(nb_pts), average_training_losses)
ax.set_xlabel("iteration * 100")
ax.set_ylabel("classification loss")

ax = fig.add_subplot(122)
ax.plot(np.arange(len(average_val_accs)), average_val_accs)
ax.set_xlabel("iteration * 100")
ax.set_ylabel("validation accuracy")

In [None]:
train_accs = []
#TODO
average_train_acc = sum(train_accs) / len(train_accs)
print("ACCURACY - TRAINING SET: {}".format(average_train_acc))

test_accs = []
#TODO

average_test_acc = sum(test_accs) / len(test_accs)
print("ACCURACY - TEST SET: {}".format(average_test_acc))

### EXTRA QUESTION 1: Change the code for *CNN* by adding dropout, for example between the flatten and dense_1 layers.

### EXTRA QUESTION 2: Repeat the previous experiments with different values for the learning rate and batch size.

### EXTRA QUESTION 3: On the small dataset, save the model when validation accurary is at its highest, and check for test accuracy (early stopping). 

### Theoretical questions to think about:
- What are the advantages and disadvantages of having a large validation set?
- Same question for a small one.
- Why do we even have a validation set at all? Why not directly use the test set?
- Take another look at the model we used. Can you point out any issues with it?
- Manufacturing small examples of overfitting was actually quite challenging. To do this, a certain amount of labels (20% of the training set) had to be corrupted. How does that drive the model into overfitting?