# Homework 2.2: The Quest For A Better Network

In this assignment you will build a monster network to solve Tiny ImageNet image classification.

This notebook is intended as a sequel to seminar 3, please give it a try if you haven't done so yet.

(please read it at least diagonally)

* The ultimate quest is to create a network that has as high __accuracy__ as you can push it.
* There is a __mini-report__ at the end that you will have to fill in. We recommend reading it first and filling it while you iterate.
 
## Grading
* starting at zero points
* +20% for describing your iteration path in a report below.
* +20% for building a network that gets above 20% accuracy
* +10% for beating each of these milestones on __TEST__ dataset:
    * 25% (50% points)
    * 30% (60% points)
    * 32.5% (70% points)
    * 35% (80% points)
    * 37.5% (90% points)
    * 40% (full points)
    
## Restrictions
* Please do NOT use pre-trained networks for this assignment until you reach 40%.
 * In other words, base milestones must be beaten without pre-trained nets (and such net must be present in the anytask atttachments). After that, you can use whatever you want.
* you __can't__ do anything with validation data apart from running the evaluation procedure. Please, split train images on train and validation parts

## Tips on what can be done:


 * __Network size__
   * MOAR neurons, 
   * MOAR layers, ([torch.nn docs](http://pytorch.org/docs/master/nn.html))

   * Nonlinearities in the hidden layers
     * tanh, relu, leaky relu, etc
   * Larger networks may take more epochs to train, so don't discard your net just because it could didn't beat the baseline in 5 epochs.

   * Ph'nglui mglw'nafh Cthulhu R'lyeh wgah'nagl fhtagn!


### The main rule of prototyping: one change at a time
   * By now you probably have several ideas on what to change. By all means, try them out! But there's a catch: __never test several new things at once__.


### Optimization
   * Training for 100 epochs regardless of anything is probably a bad idea.
   * Some networks converge over 5 epochs, others - over 500.
   * Way to go: stop when validation score is 10 iterations past maximum
   * You should certainly use adaptive optimizers
     * rmsprop, nesterov_momentum, adam, adagrad and so on.
     * Converge faster and sometimes reach better optima
     * It might make sense to tweak learning rate/momentum, other learning parameters, batch size and number of epochs
   * __BatchNormalization__ (nn.BatchNorm2d) for the win!
     * Sometimes more batch normalization is better.
   * __Regularize__ to prevent overfitting
     * Add some L2 weight norm to the loss function, PyTorch will do the rest
       * Can be done manually or like [this](https://discuss.pytorch.org/t/simple-l2-regularization/139/2).
     * Dropout (`nn.Dropout`) - to prevent overfitting
       * Don't overdo it. Check if it actually makes your network better
   
### Convolution architectures
   * This task __can__ be solved by a sequence of convolutions and poolings with batch_norm and ReLU seasoning, but you shouldn't necessarily stop there.
   * [Inception family](https://hacktilldawn.com/2016/09/25/inception-modules-explained-and-implemented/), [ResNet family](https://towardsdatascience.com/an-overview-of-resnet-and-its-variants-5281e2f56035?gi=9018057983ca), [Densely-connected convolutions (exotic)](https://arxiv.org/abs/1608.06993), [Capsule networks (exotic)](https://arxiv.org/abs/1710.09829)
   * Please do try a few simple architectures before you go for resnet-152.
   * Warning! Training convolutional networks can take long without GPU. That's okay.
     * If you are CPU-only, we still recomment that you try a simple convolutional architecture
     * a perfect option is if you can set it up to run at nighttime and check it up at the morning.
     * Make reasonable layer size estimates. A 128-neuron first convolution is likely an overkill.
     * __To reduce computation__ time by a factor in exchange for some accuracy drop, try using __stride__ parameter. A stride=2 convolution should take roughly 1/4 of the default (stride=1) one.
 
   
### Data augmemntation
   * getting 5x as large dataset for free is a great 
     * Zoom-in+slice = move
     * Rotate+zoom(to remove black stripes)
     * Add Noize (gaussian or bernoulli)
   * Simple way to do that (if you have PIL/Image): 
     * ```from scipy.misc import imrotate,imresize```
     * and a few slicing
     * Other cool libraries: cv2, skimake, PIL/Pillow
   * A more advanced way is to use torchvision transforms:
    ```
    transform_train = transforms.Compose([
        transforms.RandomCrop(32, padding=4),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
    ])
    trainset = torchvision.datasets.ImageFolder(root=path_to_tiny_imagenet, train=True, download=True, transform=transform_train)
    trainloader = torch.utils.data.DataLoader(trainset, batch_size=128, shuffle=True, num_workers=2)

    ```
   * Or use this tool from Keras (requires theano/tensorflow): [tutorial](https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html), [docs](https://keras.io/preprocessing/image/)
   * Stay realistic. There's usually no point in flipping dogs upside down as that is not the way you usually see them.
   


In [1]:
import numpy as np
import matplotlib.pyplot as plt
import torchvision
import torch
from torchvision import transforms
import torch, torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable

%matplotlib inline

In [2]:
# !git clone https://github.com/seshuad/IMagenet

In [3]:
import torch
import torchvision
from torchvision import transforms
means = np.array((0.4914, 0.4822, 0.4465))
stds = np.array((0.2023, 0.1994, 0.2010))

transform_train_val = transforms.Compose([
    transforms.RandomRotation(degrees = 30),
    transforms.RandomHorizontalFlip(p = 0.5),
    transforms.CenterCrop(size = 64), 
    transforms.ToTensor(), 
    transforms.Normalize(means, stds)
])
dataset = torchvision.datasets.ImageFolder('IMagenet/tiny-imagenet-200/train', transform = transform_train_val)
test_dataset = torchvision.datasets.ImageFolder('IMagenet/tiny-imagenet-200/val', transform=transforms.ToTensor())
train_dataset, val_dataset = torch.utils.data.random_split(dataset, [80000, 20000])

batch_size = 32
train_batch = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size, shuffle=True, num_workers=2)
val_batch = torch.utils.data.DataLoader(val_dataset, batch_size=batch_size, shuffle=True, num_workers=2)
test_batch = torch.utils.data.DataLoader(test_dataset, batch_size=batch_size, shuffle=True, num_workers=2)

In [4]:
# feel free to copypaste code from seminar03 as a basic template for training

In [5]:
class Flatten(nn.Module):
    def forward(self, input):
        return input.view(input.size(0), -1)
    
np.random.seed(42)
torch.manual_seed(42)
torch.backends.cudnn.deterministic = True 
torch.backends.cudnn.benchmark = False 


model =  nn.Sequential(
            nn.Conv2d(in_channels=3, out_channels=128, kernel_size=3), #3 64 64 - 8 62 62
            nn.MaxPool2d(2), 
            nn.ReLU(),
            nn.BatchNorm2d(128),
            nn.Dropout(p = 0.3),
    
            nn.Conv2d(in_channels=128, out_channels=128, kernel_size=3),
            nn.MaxPool2d(2),
            nn.ReLU(),
            nn.BatchNorm2d(128),
            nn.Dropout(p = 0.3),
    
            nn.Conv2d(in_channels=128, out_channels=256, kernel_size=3),
            nn.MaxPool2d(2),   
            nn.ReLU(),
            nn.BatchNorm2d(256),
            nn.Dropout(p = 0.3),
    
            nn.Conv2d(in_channels=256, out_channels=512, kernel_size=3),
            nn.MaxPool2d(2),
            nn.ReLU(),
            nn.BatchNorm2d(512),
            Flatten(),
            nn.Dropout(p = 0.3),
    
            nn.Linear(2048, 512),
            nn.ReLU(),
            nn.Dropout(p = 0.3),
            
            nn.Linear(512,200)

            )

In [6]:
def compute_loss(X_batch, y_batch):
    X_batch = Variable(torch.FloatTensor(X_batch))
    y_batch = Variable(torch.LongTensor(y_batch))
    logits = model(X_batch)
    
    loss = F.cross_entropy(logits, y_batch).mean()
    

    return loss

In [7]:
from torch.optim import Adam
from tqdm.auto import tqdm
opt = Adam(model.parameters(),
           lr = 1e-3, 
           weight_decay = 1e-4) 

train_loss = []
val_accuracy = []

train_loss = []
val_accuracy = []

num_epochs = 10 

import time

for epoch in range(num_epochs):
    start_time = time.time()
    model.train(True) 
    for (X_batch, y_batch) in tqdm(train_batch):
        loss = compute_loss(X_batch, y_batch)
        loss.backward()
        opt.step()
        opt.zero_grad()
        train_loss.append(loss.data.numpy())
        
    model.eval()
    for X_batch, y_batch in val_batch:
        logits = model(Variable(torch.FloatTensor(X_batch)))
        y_pred = logits.max(1)[1].data
        val_accuracy.append(np.mean( (y_batch == y_pred).numpy() ))

    print("Epoch {} of {} took {:.3f}s".format(
        epoch + 1, num_epochs, time.time() - start_time))
    print("  training loss (in-iteration): \t{:.6f}".format(
        np.mean(train_loss[-len(train_dataset) // batch_size :])))
    print("  validation accuracy: \t\t\t{:.2f} %".format(
        np.mean(val_accuracy[-len(val_dataset) // batch_size :]) * 100))

  0%|          | 0/2500 [00:00<?, ?it/s]

Epoch 1 of 10 took 639.987s
  training loss (in-iteration): 	4.626562
  validation accuracy: 			13.76 %


  0%|          | 0/2500 [00:00<?, ?it/s]

Exception ignored in: <function _MultiProcessingDataLoaderIter.__del__ at 0x7faa1ef3e160>
Traceback (most recent call last):
Exception ignored in: <function _MultiProcessingDataLoaderIter.__del__ at 0x7faa1ef3e160>
Traceback (most recent call last):
  File "/opt/conda/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1358, in __del__
    self._shutdown_workers()
  File "/opt/conda/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1341, in _shutdown_workers
  File "/opt/conda/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1358, in __del__
        self._shutdown_workers()
  File "/opt/conda/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1341, in _shutdown_workers
    if w.is_alive():
  File "/opt/conda/lib/python3.9/multiprocessing/process.py", line 160, in is_alive
    assert self._parent_pid == os.getpid(), 'can only test a child process'
AssertionError: if w.is_alive():can only test a child process

  File "/opt/con

Epoch 2 of 10 took 631.085s
  training loss (in-iteration): 	4.075722
  validation accuracy: 			17.57 %


Exception ignored in: 

  0%|          | 0/2500 [00:00<?, ?it/s]

<function _MultiProcessingDataLoaderIter.__del__ at 0x7faa1ef3e160>
Traceback (most recent call last):
  File "/opt/conda/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1358, in __del__
    self._shutdown_workers()
  File "/opt/conda/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1341, in _shutdown_workers
    Exception ignored in: if w.is_alive():<function _MultiProcessingDataLoaderIter.__del__ at 0x7faa1ef3e160>

  File "/opt/conda/lib/python3.9/multiprocessing/process.py", line 160, in is_alive
Traceback (most recent call last):
      File "/opt/conda/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1358, in __del__
assert self._parent_pid == os.getpid(), 'can only test a child process'    
self._shutdown_workers()
AssertionError  File "/opt/conda/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1341, in _shutdown_workers
:     can only test a child processif w.is_alive():

  File "/opt/conda/lib/python3.9/multi

Epoch 3 of 10 took 610.682s
  training loss (in-iteration): 	3.844534
  validation accuracy: 			20.78 %


  0%|          | 0/2500 [00:00<?, ?it/s]

Epoch 4 of 10 took 637.916s
  training loss (in-iteration): 	3.657720
  validation accuracy: 			23.22 %


  0%|          | 0/2500 [00:00<?, ?it/s]

Epoch 5 of 10 took 627.226s
  training loss (in-iteration): 	3.536037
  validation accuracy: 			24.65 %


  0%|          | 0/2500 [00:00<?, ?it/s]

Exception ignored in: <function _MultiProcessingDataLoaderIter.__del__ at 0x7faa1ef3e160>
Traceback (most recent call last):
  File "/opt/conda/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1358, in __del__
    self._shutdown_workers()
  File "/opt/conda/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1341, in _shutdown_workers
    if w.is_alive():
  File "/opt/conda/lib/python3.9/multiprocessing/process.py", line 160, in is_alive
    assert self._parent_pid == os.getpid(), 'can only test a child process'
AssertionError: Exception ignored in: <function _MultiProcessingDataLoaderIter.__del__ at 0x7faa1ef3e160>
Traceback (most recent call last):
can only test a child process  File "/opt/conda/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1358, in __del__
    self._shutdown_workers()
  File "/opt/conda/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1341, in _shutdown_workers
    if w.is_alive():
  File "/opt/cond

Epoch 6 of 10 took 630.333s
  training loss (in-iteration): 	3.442004
  validation accuracy: 			25.64 %


  0%|          | 0/2500 [00:00<?, ?it/s]

Exception ignored in: <function _MultiProcessingDataLoaderIter.__del__ at 0x7faa1ef3e160>
Traceback (most recent call last):
  File "/opt/conda/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1358, in __del__
Exception ignored in:     <function _MultiProcessingDataLoaderIter.__del__ at 0x7faa1ef3e160>self._shutdown_workers()

Traceback (most recent call last):
  File "/opt/conda/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1341, in _shutdown_workers
  File "/opt/conda/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1358, in __del__
        if w.is_alive():
self._shutdown_workers()  File "/opt/conda/lib/python3.9/multiprocessing/process.py", line 160, in is_alive

      File "/opt/conda/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1341, in _shutdown_workers
    assert self._parent_pid == os.getpid(), 'can only test a child process'if w.is_alive():

AssertionError  File "/opt/conda/lib/python3.9/multiprocessing

Epoch 7 of 10 took 690.343s
  training loss (in-iteration): 	3.357197
  validation accuracy: 			27.81 %


  0%|          | 0/2500 [00:00<?, ?it/s]

Epoch 8 of 10 took 728.043s
  training loss (in-iteration): 	3.308090
  validation accuracy: 			28.25 %


  0%|          | 0/2500 [00:00<?, ?it/s]

Exception ignored in: <function _MultiProcessingDataLoaderIter.__del__ at 0x7faa1ef3e160>
Traceback (most recent call last):
  File "/opt/conda/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1358, in __del__
    self._shutdown_workers()
  File "/opt/conda/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1341, in _shutdown_workers
    if w.is_alive():
  File "/opt/conda/lib/python3.9/multiprocessing/process.py", line 160, in is_alive
    assert self._parent_pid == os.getpid(), 'can only test a child process'
AssertionError: can only test a child process
Exception ignored in: <function _MultiProcessingDataLoaderIter.__del__ at 0x7faa1ef3e160>
Traceback (most recent call last):
  File "/opt/conda/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1358, in __del__
    self._shutdown_workers()
  File "/opt/conda/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1341, in _shutdown_workers
    if w.is_alive():
  File "/opt/con

Epoch 9 of 10 took 683.633s
  training loss (in-iteration): 	3.249937
  validation accuracy: 			29.51 %


Exception ignored in: <function _MultiProcessingDataLoaderIter.__del__ at 0x7faa1ef3e160>

  0%|          | 0/2500 [00:00<?, ?it/s]


Traceback (most recent call last):
  File "/opt/conda/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1358, in __del__
    self._shutdown_workers()
  File "/opt/conda/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1341, in _shutdown_workers
    if w.is_alive():
  File "/opt/conda/lib/python3.9/multiprocessing/process.py", line 160, in is_alive
    Exception ignored in: assert self._parent_pid == os.getpid(), 'can only test a child process'<function _MultiProcessingDataLoaderIter.__del__ at 0x7faa1ef3e160>
AssertionError
: Traceback (most recent call last):
can only test a child process  File "/opt/conda/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1358, in __del__

    self._shutdown_workers()
  File "/opt/conda/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1341, in _shutdown_workers
    if w.is_alive():
  File "/opt/conda/lib/python3.9/multiprocessing/process.py", line 160, in is_alive
    assert self._pare

Epoch 10 of 10 took 684.212s
  training loss (in-iteration): 	3.202363
  validation accuracy: 			30.70 %


When everything is done, please calculate accuracy on `tiny-imagenet-200/val`

In [None]:
test_acc=[]
for X_batch, y_batch in test_batch:
        logits = model(Variable(torch.FloatTensor(X_batch)).cuda())
        y_pred = logits.max(1)[1].data
        test_acc.append(np.mean( (y_batch.cpu() == y_pred.cpu()).numpy() ))
        #print(np.mean( (y_batch.cpu() == y_pred.cpu()).numpy()))
test_accuracy = np.mean(test_acc)

In [None]:
print("Final results:")
print("  test accuracy:\t\t{:.2f} %".format(
    test_accuracy * 100))

if test_accuracy * 100 > 40:
    print("Achievement unlocked: 110lvl Warlock!")
elif test_accuracy * 100 > 35:
    print("Achievement unlocked: 80lvl Warlock!")
elif test_accuracy * 100 > 30:
    print("Achievement unlocked: 70lvl Warlock!")
elif test_accuracy * 100 > 25:
    print("Achievement unlocked: 60lvl Warlock!")
else:
    print("We need more magic! Follow instructons below")

Report

- Для 1й итерации обучения опробовала стандартную модель из свертки-пулинга-активации - на 10ти эпохах качество было 10%
- Далее углубила сеть тремя такими конструкциями - на 15ти эпохах стало 18%
- С добавлением нормализации и дропаута - уменьшилось переобучение - стало 25% на 15ти эпохах
- После чего добавила трансформ для картинок - повороты, отражения - уменьшилось переобучение - 28%
- Донастроила lr и размеры внутренних слоев - 30%