## Lab 2
### Part 2: Dealing with overfitting

Today we work with [Fashion-MNIST dataset](https://github.com/zalandoresearch/fashion-mnist) (*hint: it is available in `torchvision`*).

Your goal for today:
1. Train a FC (fully-connected) network that achieves >= 0.885 test accuracy.
2. Cause considerable overfitting by modifying the network (e.g. increasing the number of network parameters and/or layers) and demonstrate in in the appropriate way (e.g. plot loss and accurasy on train and validation set w.r.t. network complexity).
3. Try to deal with overfitting (at least partially) by using regularization techniques (Dropout/Batchnorm/...) and demonstrate the results.

__Please, write a small report describing your ideas, tries and achieved results in the end of this file.__

*Note*: Tasks 2 and 3 are interrelated, in task 3 your goal is to make the network from task 2 less prone to overfitting. Task 1 is independent from 2 and 3.

*Note 2*: We recomment to use Google Colab or other machine with GPU acceleration.

In [None]:
import torch
import torch.nn as nn
import torchvision
import torchvision.transforms as transforms
import torchsummary
import torch.utils.data
from IPython.display import clear_output
from matplotlib import pyplot as plt
from matplotlib.pyplot import figure
import numpy as np
from sklearn.metrics import accuracy_score
import os
import torch.nn.functional as F


device = 'cuda:0' if torch.cuda.is_available() else 'cpu'

In [None]:
# Technical function
def mkdir(path):
    if not os.path.exists(root_path):
        os.mkdir(root_path)
        print('Directory', path, 'is created!')
    else:
        print('Directory', path, 'already exists!')
        
root_path = 'fmnist'
mkdir(root_path)

Directory fmnist is created!


In [None]:
download = True
train_transform = transforms.ToTensor()
test_transform = transforms.ToTensor()
transforms.Compose((transforms.ToTensor()))


fmnist_dataset_train = torchvision.datasets.FashionMNIST(root_path, 
                                                        train=True, 
                                                        transform=train_transform,
                                                        target_transform=None,
                                                        download=download)
fmnist_dataset_test = torchvision.datasets.FashionMNIST(root_path, 
                                                       train=False, 
                                                       transform=test_transform,
                                                       target_transform=None,
                                                       download=download)

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz to fmnist/FashionMNIST/raw/train-images-idx3-ubyte.gz


  0%|          | 0/26421880 [00:00<?, ?it/s]

Extracting fmnist/FashionMNIST/raw/train-images-idx3-ubyte.gz to fmnist/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz to fmnist/FashionMNIST/raw/train-labels-idx1-ubyte.gz


  0%|          | 0/29515 [00:00<?, ?it/s]

Extracting fmnist/FashionMNIST/raw/train-labels-idx1-ubyte.gz to fmnist/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz to fmnist/FashionMNIST/raw/t10k-images-idx3-ubyte.gz


  0%|          | 0/4422102 [00:00<?, ?it/s]

Extracting fmnist/FashionMNIST/raw/t10k-images-idx3-ubyte.gz to fmnist/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz to fmnist/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz


  0%|          | 0/5148 [00:00<?, ?it/s]

Extracting fmnist/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz to fmnist/FashionMNIST/raw



In [None]:
train_loader = torch.utils.data.DataLoader(fmnist_dataset_train, 
                                           batch_size=128,
                                           shuffle=True,
                                           num_workers=2)
test_loader = torch.utils.data.DataLoader(fmnist_dataset_test,
                                          batch_size=256,
                                          shuffle=False,
                                          num_workers=2)

In [None]:
len(fmnist_dataset_test)

10000

### Task 1
Train a network that achieves $\geq 0.885$ test accuracy. It's fine to use only Linear (`nn.Linear`) layers and activations/dropout/batchnorm. Convolutional layers might be a great use, but we will meet them a bit later.

In [None]:
class TinyNeuralNetwork(nn.Module):
    def __init__(self, input_shape=28*28, num_classes=10, input_channels=1):
        super(self.__class__, self).__init__()
        self.model = nn.Sequential(
            nn.Flatten(), # This layer converts image into a vector to use Linear layers afterwards
            # Your network structure comes here
            nn.BatchNorm1d(input_shape),
            nn.ReLU(),
            nn.Dropout(0.5),
            nn.Linear(input_shape, 512),
            nn.BatchNorm1d(512),
            nn.Dropout(0.2),
            nn.ReLU(),
            nn.Linear(512, 256),
            nn.BatchNorm1d(256),
            nn.Linear(256, 64),
            nn.BatchNorm1d(64),
            nn.ReLU(),
            nn.Linear(64, 10)
        )
        
    def forward(self, inp):    
        out = self.model(inp)
        return out

In [None]:
torchsummary.summary(TinyNeuralNetwork().to(device), (28*28,))

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
           Flatten-1                  [-1, 784]               0
       BatchNorm1d-2                  [-1, 784]           1,568
              ReLU-3                  [-1, 784]               0
           Dropout-4                  [-1, 784]               0
            Linear-5                  [-1, 512]         401,920
       BatchNorm1d-6                  [-1, 512]           1,024
           Dropout-7                  [-1, 512]               0
              ReLU-8                  [-1, 512]               0
            Linear-9                  [-1, 256]         131,328
      BatchNorm1d-10                  [-1, 256]             512
           Linear-11                   [-1, 64]          16,448
      BatchNorm1d-12                   [-1, 64]             128
             ReLU-13                   [-1, 64]               0
           Linear-14                   

Your experiments come here:

In [None]:
model = TinyNeuralNetwork().to(device)
opt = torch.optim.Adam(model.parameters(), lr = 0.05)
loss_func = nn.CrossEntropyLoss()

# Your experiments, training and validation loops here
epochs = 50

min_valid_loss = np.inf

for epoch in range(epochs):
    train_loss = 0.0
    for data, labels in train_loader:
        # Transfer Data to GPU if available
        if torch.cuda.is_available():
            data, labels = data.cuda(), labels.cuda()
         
        opt.zero_grad()
        target = model(data)
        loss = loss_func(target, labels)
        loss.backward()

        opt.step()

        train_loss += loss.item()
     
    valid_loss = 0.0
    model.eval()     # Optional when not using Model Specific layer
    for data, labels in test_loader:
        # Transfer Data to GPU if available
        if torch.cuda.is_available():
            data, labels = data.cuda(), labels.cuda()
         
        target = model(data)
        loss = loss_func(target,labels)
        valid_loss += loss.item()
 
    print(f'Epoch {epoch+1} \t\t Training Loss: {train_loss / len(train_loader)} \t\t Validation Loss: {valid_loss / len(test_loader)}')
     
    if min_valid_loss > valid_loss:
        print(f'Validation Loss Decreased({min_valid_loss:.6f}--->{valid_loss:.6f}) \t Saving The Model')
        min_valid_loss = valid_loss
         
        # Saving State Dict
        torch.save(model.state_dict(), 'saved_model.pth')

Epoch 1 		 Training Loss: 0.6108481383908277 		 Validation Loss: 0.43716992512345315
Validation Loss Decreased(inf--->17.486797) 	 Saving The Model
Epoch 1 has accurcy score of test data: 0.838
Epoch 2 		 Training Loss: 0.3794445709696711 		 Validation Loss: 0.42567205280065534
Validation Loss Decreased(17.486797--->17.026882) 	 Saving The Model
Epoch 2 has accurcy score of test data: 0.838
Epoch 3 		 Training Loss: 0.3225026708612564 		 Validation Loss: 0.3495135050266981
Validation Loss Decreased(17.026882--->13.980540) 	 Saving The Model
Epoch 3 has accurcy score of test data: 0.868
Epoch 4 		 Training Loss: 0.29075109888749845 		 Validation Loss: 0.35751201324164866
Epoch 4 has accurcy score of test data: 0.866
Epoch 5 		 Training Loss: 0.2651501383417959 		 Validation Loss: 0.33244835808873174
Validation Loss Decreased(13.980540--->13.297934) 	 Saving The Model
Epoch 5 has accurcy score of test data: 0.877
Epoch 6 		 Training Loss: 0.246120250714359 		 Validation Loss: 0.334176360

In [None]:
test_loss = 0.0
total = 0
correct = 0

for data,label in test_loader:
    data, label = data.cuda(), label.cuda()
    prediction = model(data)
    for x,label in zip(torch.argmax(prediction,axis = 1),label):
        if x == label:
            correct += 1
        total += 1
    loss = loss_func(prediction,label)
    test_loss += loss.item() * data.size(0)

test_loss = test_loss/len(test_loader)
correct_proportion = correct/total                          
print('Testing Loss: ', test_loss)
print('Correct Predictions: ', correct_proportion)

Testing Loss:83.84315204620361
Correct Predictions: 8858/10000


### Task 2: Overfit it.
Build a network that will overfit to this dataset. Demonstrate the overfitting in the appropriate way (e.g. plot loss and accurasy on train and test set w.r.t. network complexity).

*Note:* you also might decrease the size of `train` dataset to enforce the overfitting and speed up the computations.

In [None]:
train_ds, val_ds = torch.utils.data.random_split(fmnist_dataset_test, [9000, 1000])
train_ds, val_ds = torch.utils.data.random_split(val_ds, [900, 100])

In [None]:
train_loader = torch.utils.data.DataLoader(train_ds, 
                                           batch_size=512,
                                           shuffle=True,
                                           num_workers=2)
test_loader = torch.utils.data.DataLoader(val_ds,
                                          batch_size=128,
                                          shuffle=False,
                                          num_workers=2)

In [None]:
class OverfittingNeuralNetwork(nn.Module):
    def __init__(self, input_shape=28*28, num_classes=10, input_channels=1):
        super(self.__class__, self).__init__()
        self.model = nn.Sequential(
            nn.Flatten(), # This layer converts image into a vector to use Linear layers afterwards
            nn.Linear(input_shape, num_classes)
        )
        
    def forward(self, inp):       
        out = self.model(inp)
        return out

In [None]:
torchsummary.summary(OverfittingNeuralNetwork().to(device), (28*28,))

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
           Flatten-1                  [-1, 784]               0
            Linear-2                  [-1, 512]         401,920
            Linear-3                  [-1, 256]         131,328
            Linear-4                  [-1, 128]          32,896
            Linear-5                   [-1, 10]           1,290
Total params: 567,434
Trainable params: 567,434
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.00
Forward/backward pass size (MB): 0.01
Params size (MB): 2.16
Estimated Total Size (MB): 2.18
----------------------------------------------------------------


In [None]:
model = OverfittingNeuralNetwork().to(device)
opt = torch.optim.Adam(model.parameters(), lr = 0.05)
loss_func = nn.CrossEntropyLoss()

# Your experiments, training and validation loops here
epochs = 50

min_valid_loss = np.inf

for epoch in range(epochs):
    train_loss = 0.0
    for data, labels in train_loader:
        # Transfer Data to GPU if available
        if torch.cuda.is_available():
            data, labels = data.cuda(), labels.cuda()
         
        opt.zero_grad()
        target = model(data)
        loss = loss_func(target, labels)
        loss.backward()

        opt.step()

        train_loss += loss.item()
     
    valid_loss = 0.0
    model.eval()     # Optional when not using Model Specific layer
    for data, labels in test_loader:
        # Transfer Data to GPU if available
        if torch.cuda.is_available():
            data, labels = data.cuda(), labels.cuda()
         
        target = model(data)
        loss = loss_func(target,labels)
        valid_loss += loss.item()
 
    print(f'Epoch {epoch+1} \t\t Training Loss: {train_loss / len(train_loader)} \t\t Validation Loss: {valid_loss / len(test_loader)}')
     
    if min_valid_loss > valid_loss:
        print(f'Validation Loss Decreased({min_valid_loss:.6f}--->{valid_loss:.6f}) \t Saving The Model')
        min_valid_loss = valid_loss
         
        # Saving State Dict
        torch.save(model.state_dict(), 'saved_model.pth')

Epoch 1 		 Training Loss: 2.4264168739318848 		 Validation Loss: 9.546398162841797
Validation Loss Decreased(inf--->9.546398) 	 Saving The Model
Epoch 0 has accurcy score of train data: 0.312
Epoch 2 		 Training Loss: 11.059917449951172 		 Validation Loss: 17.9643497467041
Epoch 1 has accurcy score of train data: 0.193
Epoch 3 		 Training Loss: 16.81727123260498 		 Validation Loss: 13.834010124206543
Epoch 2 has accurcy score of train data: 0.415
Epoch 4 		 Training Loss: 11.151932716369629 		 Validation Loss: 35.456565856933594
Epoch 3 has accurcy score of train data: 0.082
Epoch 5 		 Training Loss: 26.31991481781006 		 Validation Loss: 14.99642562866211
Epoch 4 has accurcy score of train data: 0.363
Epoch 6 		 Training Loss: 16.192634105682373 		 Validation Loss: 10.36890983581543
Epoch 5 has accurcy score of train data: 0.353
Epoch 7 		 Training Loss: 10.688169479370117 		 Validation Loss: 7.47598123550415
Validation Loss Decreased(9.546398--->7.475981) 	 Saving The Model
Epoch 6 ha

In [None]:
training_loss = 0.0
total = 0
correct = 0

for data,label in train_loader:
    data, label = data.cuda(), label.cuda()
    prediction = model(data)
    for x,label in zip(torch.argmax(prediction,axis = 1),label):
        if x == label:
            correct += 1
        total += 1
    loss = loss_func(prediction,label)
    training_loss += loss.item() * data.size(0)

training_loss = training_loss/len(train_loader)
correct_proportion = correct/total                          
print('Testing Loss: ', training_loss)
print('Correct Predictions: ', correct_proportion)

Testing Loss:40.12359036505222
Correct Predictions: 883/900


In [None]:
test_loss = 0.0
total = 0
correct = 0

for data,label in test_loader:
    data, label = data.cuda(), label.cuda()
    prediction = model(data)
    for x,label in zip(torch.argmax(prediction,axis = 1),label):
        if x == label:
            correct += 1
        total += 1
    loss = loss_func(prediction,label)
    test_loss += loss.item() * data.size(0)

test_loss = test_loss/len(test_loader)
correct_proportion = correct/total                          
print('Testing Loss: ', test_loss)
print('Correct Predictions: ', correct_proportion)

Testing Loss:118.11268329620361
Correct Predictions: 75/100


### Task 3: Fix it.
Fix the overfitted network from the previous step (at least partially) by using regularization techniques (Dropout/Batchnorm/...) and demonstrate the results. 

In [None]:
class FixedNeuralNetwork(nn.Module):
    def __init__(self, input_shape=28*28, num_classes=10, input_channels=1):
        super(self.__class__, self).__init__()
        self.model = nn.Sequential(
            nn.Flatten(), # This layer converts image into a vector to use Linear layers afterwards
            # Your network structure comes here
            nn.BatchNorm1d(input_shape),
            nn.Dropout(0.5),
            nn.ReLU(),
            nn.Linear(input_shape, 256),
            nn.ReLU(),
            nn.Linear(64, num_classes)
        )
        
    def forward(self, inp):       
        out = self.model(inp)
        return out

In [None]:
torchsummary.summary(FixedNeuralNetwork().to(device), (28*28,))

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
           Flatten-1                  [-1, 784]               0
       BatchNorm1d-2                  [-1, 784]           1,568
              ReLU-3                  [-1, 784]               0
           Dropout-4                  [-1, 784]               0
            Linear-5                  [-1, 256]         200,960
              ReLU-6                  [-1, 256]               0
            Linear-7                   [-1, 64]          16,448
              ReLU-8                   [-1, 64]               0
            Linear-9                   [-1, 10]             650
Total params: 219,626
Trainable params: 219,626
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.00
Forward/backward pass size (MB): 0.03
Params size (MB): 0.84
Estimated Total Size (MB): 0.87
-------------------------------------------

In [None]:
model = FixedNeuralNetwork().to(device)
opt = torch.optim.Adam(model.parameters(), lr = 0.05)
loss_func = nn.CrossEntropyLoss()

# Your experiments, training and validation loops here
epochs = 50

min_valid_loss = np.inf

for epoch in range(epochs):
    train_loss = 0.0
    for data, labels in train_loader:
        # Transfer Data to GPU if available
        if torch.cuda.is_available():
            data, labels = data.cuda(), labels.cuda()
         
        opt.zero_grad()
        target = model(data)
        loss = loss_func(target, labels)
        loss.backward()

        opt.step()

        train_loss += loss.item()
     
    valid_loss = 0.0
    model.eval()     # Optional when not using Model Specific layer
    for data, labels in test_loader:
        # Transfer Data to GPU if available
        if torch.cuda.is_available():
            data, labels = data.cuda(), labels.cuda()
         
        target = model(data)
        loss = loss_func(target,labels)
        valid_loss += loss.item()
 
    print(f'Epoch {epoch+1} \t\t Training Loss: {train_loss / len(train_loader)} \t\t Validation Loss: {valid_loss / len(test_loader)}')
     
    if min_valid_loss > valid_loss:
        print(f'Validation Loss Decreased({min_valid_loss:.6f}--->{valid_loss:.6f}) \t Saving The Model')
        min_valid_loss = valid_loss
         
        # Saving State Dict
        torch.save(model.state_dict(), 'saved_model.pth')

Epoch 1 		 Training Loss: 10.449195265769958 		 Validation Loss: 28.84555435180664
Validation Loss Decreased(inf--->28.845554) 	 Saving The Model
Epoch 2 		 Training Loss: 18.65770387649536 		 Validation Loss: 6.690579414367676
Validation Loss Decreased(28.845554--->6.690579) 	 Saving The Model
Epoch 3 		 Training Loss: 5.444067120552063 		 Validation Loss: 3.228569269180298
Validation Loss Decreased(6.690579--->3.228569) 	 Saving The Model
Epoch 4 		 Training Loss: 2.6024450063705444 		 Validation Loss: 1.8985002040863037
Validation Loss Decreased(3.228569--->1.898500) 	 Saving The Model
Epoch 5 		 Training Loss: 1.8171234726905823 		 Validation Loss: 1.7838352918624878
Validation Loss Decreased(1.898500--->1.783835) 	 Saving The Model
Epoch 6 		 Training Loss: 1.6241376996040344 		 Validation Loss: 1.4993287324905396
Validation Loss Decreased(1.783835--->1.499329) 	 Saving The Model
Epoch 7 		 Training Loss: 1.3789506554603577 		 Validation Loss: 1.4001154899597168
Validation Loss De

In [None]:
training_loss = 0.0
total = 0
correct = 0

for data,label in train_loader:
    data, label = data.cuda(), label.cuda()
    prediction = model(data)
    for x,label in zip(torch.argmax(prediction,axis = 1),label):
        if x == label:
            correct += 1
        total += 1
    loss = loss_func(prediction,label)
    training_loss += loss.item() * data.size(0)

training_loss = training_loss/len(train_loader)
correct_proportion = correct/total                          
print('Testing Loss: ', training_loss)
print('Correct Predictions: ', correct_proportion)

Testing Loss:22.912328884005547
Correct Predictions: 883/900


In [None]:
test_loss = 0.0
total = 0
correct = 0

for data,label in test_loader:
    data, label = data.cuda(), label.cuda()
    prediction = model(data)
    for x,label in zip(torch.argmax(prediction,axis = 1),label):
        if x == label:
            correct += 1
        total += 1
    loss = loss_func(prediction,label)
    test_loss += loss.item() * data.size(0)

test_loss = test_loss/len(test_loader)
correct_proportion = correct/total                          
print('Testing Loss: ', test_loss)
print('Correct Predictions: ', correct_proportion)

Testing Loss:153.8218379020691
Correct Predictions: 76/100


### Conclusions:
_Write down small report with your conclusions and your ideas._

In the first part we needed several layers to get good result and used activations and batchnorm.
To overfit the data the easiest way is by having small data with ig training time. I fixed it by opting for regularization techniques addition.