# Homework 2.2: The Quest For A Better Network

In this assignment you will build a monster network to solve Tiny ImageNet image classification.

This notebook is intended as a sequel to seminar 3, please give it a try if you haven't done so yet.

(please read it at least diagonally)

* The ultimate quest is to create a network that has as high __accuracy__ as you can push it.
* There is a __mini-report__ at the end that you will have to fill in. We recommend reading it first and filling it while you iterate.
 
## Grading
* starting at zero points
* +20% for describing your iteration path in a report below.
* +20% for building a network that gets above 20% accuracy
* +10% for beating each of these milestones on __TEST__ dataset:
    * 25% (50% points)
    * 30% (60% points)
    * 32.5% (70% points)
    * 35% (80% points)
    * 37.5% (90% points)
    * 40% (full points)
    
## Restrictions
* Please do NOT use pre-trained networks for this assignment until you reach 40%.
 * In other words, base milestones must be beaten without pre-trained nets (and such net must be present in the anytask atttachments). After that, you can use whatever you want.
* you __can't__ do anything with validation data apart from running the evaluation procedure. Please, split train images on train and validation parts

## Tips on what can be done:


 * __Network size__
   * MOAR neurons, 
   * MOAR layers, ([torch.nn docs](http://pytorch.org/docs/master/nn.html))

   * Nonlinearities in the hidden layers
     * tanh, relu, leaky relu, etc
   * Larger networks may take more epochs to train, so don't discard your net just because it could didn't beat the baseline in 5 epochs.

   * Ph'nglui mglw'nafh Cthulhu R'lyeh wgah'nagl fhtagn!


### The main rule of prototyping: one change at a time
   * By now you probably have several ideas on what to change. By all means, try them out! But there's a catch: __never test several new things at once__.


### Optimization
   * Training for 100 epochs regardless of anything is probably a bad idea.
   * Some networks converge over 5 epochs, others - over 500.
   * Way to go: stop when validation score is 10 iterations past maximum
   * You should certainly use adaptive optimizers
     * rmsprop, nesterov_momentum, adam, adagrad and so on.
     * Converge faster and sometimes reach better optima
     * It might make sense to tweak learning rate/momentum, other learning parameters, batch size and number of epochs
   * __BatchNormalization__ (nn.BatchNorm2d) for the win!
     * Sometimes more batch normalization is better.
   * __Regularize__ to prevent overfitting
     * Add some L2 weight norm to the loss function, PyTorch will do the rest
       * Can be done manually or like [this](https://discuss.pytorch.org/t/simple-l2-regularization/139/2).
     * Dropout (`nn.Dropout`) - to prevent overfitting
       * Don't overdo it. Check if it actually makes your network better
   
### Convolution architectures
   * This task __can__ be solved by a sequence of convolutions and poolings with batch_norm and ReLU seasoning, but you shouldn't necessarily stop there.
   * [Inception family](https://hacktilldawn.com/2016/09/25/inception-modules-explained-and-implemented/), [ResNet family](https://towardsdatascience.com/an-overview-of-resnet-and-its-variants-5281e2f56035?gi=9018057983ca), [Densely-connected convolutions (exotic)](https://arxiv.org/abs/1608.06993), [Capsule networks (exotic)](https://arxiv.org/abs/1710.09829)
   * Please do try a few simple architectures before you go for resnet-152.
   * Warning! Training convolutional networks can take long without GPU. That's okay.
     * If you are CPU-only, we still recomment that you try a simple convolutional architecture
     * a perfect option is if you can set it up to run at nighttime and check it up at the morning.
     * Make reasonable layer size estimates. A 128-neuron first convolution is likely an overkill.
     * __To reduce computation__ time by a factor in exchange for some accuracy drop, try using __stride__ parameter. A stride=2 convolution should take roughly 1/4 of the default (stride=1) one.
 
   
### Data augmemntation
   * getting 5x as large dataset for free is a great 
     * Zoom-in+slice = move
     * Rotate+zoom(to remove black stripes)
     * Add Noize (gaussian or bernoulli)
   * Simple way to do that (if you have PIL/Image): 
     * ```from scipy.misc import imrotate,imresize```
     * and a few slicing
     * Other cool libraries: cv2, skimake, PIL/Pillow
   * A more advanced way is to use torchvision transforms:
    ```
    transform_train = transforms.Compose([
        transforms.RandomCrop(32, padding=4),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
    ])
    trainset = torchvision.datasets.ImageFolder(root=path_to_tiny_imagenet, train=True, download=True, transform=transform_train)
    trainloader = torch.utils.data.DataLoader(trainset, batch_size=128, shuffle=True, num_workers=2)

    ```
   * Or use this tool from Keras (requires theano/tensorflow): [tutorial](https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html), [docs](https://keras.io/preprocessing/image/)
   * Stay realistic. There's usually no point in flipping dogs upside down as that is not the way you usually see them.
   


In [5]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

import torch
import torch.nn as nn
import torch.nn.functional as F

import torchvision
from torchvision import transforms, datasets

from torchsummary import summary

In [6]:
from tiny_img import download_tinyImg200

data_path = '.'
download_tinyImg200(data_path)

./tiny-imagenet-200.zip


## 1. Data Loading and Augmentation

In [182]:
class MapDataset(torch.utils.data.Dataset):
    def __init__(self, dataset, map_fn):
        self.dataset = dataset
        self.map = map_fn

    def __getitem__(self, index):
        return self.map(self.dataset[index][0]), self.dataset[index][1]

    def __len__(self):
        return len(self.dataset)


transforms_train = transforms.Compose([
   transforms.ColorJitter(hue=.05, saturation=.05),
   transforms.RandomHorizontalFlip(),
   transforms.RandomResizedCrop(64, scale=(0.6, 1.0)),
   transforms.ToTensor(),
   transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])

transforms_val = transforms.Compose([
   transforms.ToTensor(),
   transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])

full_set = datasets.ImageFolder(root='tiny-imagenet-200/train')
train_set, val_set = torch.utils.data.random_split(full_set, (len(full_set)-int(1e4), int(1e4)))
train_set = MapDataset(train_set, transforms_train)
val_set = MapDataset(val_set, transforms_val)

train_loader = torch.utils.data.DataLoader(train_set, batch_size=128, shuffle=True, num_workers=8)
val_loader = torch.utils.data.DataLoader(val_set, batch_size=128, shuffle=True, num_workers=8)

## 2. Model definition

In [234]:
class BasicModule(nn.Module):
    """Basic 2 layer 3x3 convnet block
    
    Contains 2 3*3 convolution layers. If downsampling, the first convolution layer has a stride of 2,
    and the input is passed through a 1*1 convolution layer with stride 2 before adding at the end.
    """
    
    def __init__(self, in_ch, out_ch, downsample=False):
        super(BasicModule, self).__init__()

        if downsample:
            stride = 2
            self.downsample = nn.Conv2d(in_ch, out_ch, 1, stride=2)
        else:
            stride = 1
            self.downsample = nn.Identity()
            
        self.conv1 = nn.Conv2d(in_ch, out_ch, 3, padding=1, stride=stride)
        self.conv2 = nn.Conv2d(out_ch, out_ch, 3, padding=1)
        
        self.bn1 = nn.BatchNorm2d(out_ch)
        self.bn2 = nn.BatchNorm2d(out_ch)
        
        self.relu = nn.ReLU(inplace=True)       
        
    def forward(self, input):
        
        out = self.conv1(input)
        out = self.bn1(out)
        out = self.relu(out)
        
        out = self.conv2(out)
        out = self.bn2(out)
        out = out + self.downsample(input)
        out = self.relu(out)
        
        return out

class MyNet(nn.Module):
    """Baby ResNet model
    
    This version includes 4 residual layers and 2 fully connected layers.
    
    Input: 3*64*64 image
    Layer 0: 5*5 convolution with 16 channels and stride 2
    Layer 1: 4 residual blocks of 2 3*3 convolutions with 32 channels
    Layer 2: 4 residual blocks of 2 3*3 convolutions with 64 channels
    Layer 3: 4 residual blocks of 2 3*3 convolutions with 128 channels
    
    FC1: Layer with 500 neurons (and ReLU activation)
    FC2: Layer with 200 neurons
    
    The output of the last layer is passed through LogSoftmax and returned.
    """
    
    def __init__(self):
        super(MyNet, self).__init__()

        self.layer_0 = nn.Conv2d(3, 16, 5, padding=2, stride=2)
        self.layer_1 = self._make_layer(4, 16, 32)
        self.layer_2 = self._make_layer(4, 32, 64)
        self.layer_3 = self._make_layer(4, 64, 128)
        self.layer_4 = self._make_layer(4, 128, 256)
        
        self.fc1 = nn.Linear(16*128, 500)
        self.fc2 = nn.Linear(500, 200)
        
        self.relu = nn.ReLU(inplace=True)

    def forward(self, input):
        
        out = self.layer_0(input)
        out = self.layer_1(out)
        out = self.layer_2(out)
        out = self.layer_3(out)

        out = torch.flatten(out, start_dim=1)
        out = self.fc1(out)
        out = self.relu(out)
        out = self.fc2(out)
        
        out = F.log_softmax(out, dim=1)
        
        return out

    def _make_layer(self, n_blocks, in_ch, out_ch):
        
        blocks = [BasicModule(in_ch, out_ch, downsample=True)]        
        for i in range(n_blocks - 1):
            blocks.append(BasicModule(out_ch, out_ch))
            
        return nn.Sequential(*blocks)
    

## Initialization and training

In [249]:
# Create the net
net = MyNet()

# Initialize all weights (He initialization)
def init_fn(module):
    if isinstance(m, (nn.Conv2d, nn.Linear)):
        nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
    elif isinstance(m, nn.BatchNorm2d):
        nn.init.constant_(m.weight, 1)
        nn.init.constant_(m.bias, 0)    
    
net.apply(init_fn)

Conv2d(3, 16, kernel_size=(5, 5), stride=(2, 2), padding=(2, 2))
Conv2d(16, 32, kernel_size=(1, 1), stride=(2, 2))
Conv2d(16, 32, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
ReLU(inplace=True)
BasicModule(
  (downsample): Conv2d(16, 32, kernel_size=(1, 1), stride=(2, 2))
  (conv1): Conv2d(16, 32, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
  (conv2): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (bn1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (bn2): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
)
Identity()
Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), paddi

MyNet(
  (layer_0): Conv2d(3, 16, kernel_size=(5, 5), stride=(2, 2), padding=(2, 2))
  (layer_1): Sequential(
    (0): BasicModule(
      (downsample): Conv2d(16, 32, kernel_size=(1, 1), stride=(2, 2))
      (conv1): Conv2d(16, 32, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
      (conv2): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (bn1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (bn2): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
    )
    (1): BasicModule(
      (downsample): Identity()
      (conv1): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (conv2): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (bn1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (bn2): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (rel

In [233]:
for batch_X, batch_y in train_loader:
    print(net(batch_X))

    break

torch.Size([128, 200])
tensor([[-5.3048, -5.2321, -5.3094,  ..., -5.3323, -5.2882, -5.2816],
        [-5.3048, -5.2253, -5.3111,  ..., -5.3309, -5.2885, -5.2727],
        [-5.3010, -5.2210, -5.3167,  ..., -5.3290, -5.2842, -5.2863],
        ...,
        [-5.3076, -5.2277, -5.3090,  ..., -5.3336, -5.2843, -5.2914],
        [-5.2993, -5.2339, -5.3077,  ..., -5.3245, -5.2821, -5.2817],
        [-5.3150, -5.2281, -5.3133,  ..., -5.3415, -5.2837, -5.2731]],
       grad_fn=<LogSoftmaxBackward>)


In [207]:
%debug

> [0;32m/home/tadej/miniconda3/envs/DL/lib/python3.8/site-packages/torch/nn/functional.py[0m(1372)[0;36mlinear[0;34m()[0m
[0;32m   1370 [0;31m        [0mret[0m [0;34m=[0m [0mtorch[0m[0;34m.[0m[0maddmm[0m[0;34m([0m[0mbias[0m[0;34m,[0m [0minput[0m[0;34m,[0m [0mweight[0m[0;34m.[0m[0mt[0m[0;34m([0m[0;34m)[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0m[0;32m   1371 [0;31m    [0;32melse[0m[0;34m:[0m[0;34m[0m[0;34m[0m[0m
[0m[0;32m-> 1372 [0;31m        [0moutput[0m [0;34m=[0m [0minput[0m[0;34m.[0m[0mmatmul[0m[0;34m([0m[0mweight[0m[0;34m.[0m[0mt[0m[0;34m([0m[0;34m)[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0m[0;32m   1373 [0;31m        [0;32mif[0m [0mbias[0m [0;32mis[0m [0;32mnot[0m [0;32mNone[0m[0;34m:[0m[0;34m[0m[0;34m[0m[0m
[0m[0;32m   1374 [0;31m            [0moutput[0m [0;34m+=[0m [0mbias[0m[0;34m[0m[0;34m[0m[0m
[0m


ipdb>  q


When everything is done, please calculate accuracy on `tiny-imagenet-200/val`

In [None]:
test_accuracy = .... # YOUR CODE

In [None]:
print("Final results:")
print("  test accuracy:\t\t{:.2f} %".format(
    test_accuracy * 100))

if test_accuracy * 100 > 40:
    print("Achievement unlocked: 110lvl Warlock!")
elif test_accuracy * 100 > 35:
    print("Achievement unlocked: 80lvl Warlock!")
elif test_accuracy * 100 > 30:
    print("Achievement unlocked: 70lvl Warlock!")
elif test_accuracy * 100 > 25:
    print("Achievement unlocked: 60lvl Warlock!")
else:
    print("We need more magic! Follow instructons below")

```

```

```

```

```

```


# Report

All creative approaches are highly welcome, but at the very least it would be great to mention
* the idea;
* brief history of tweaks and improvements;
* what is the final architecture and why?
* what is the training method and, again, why?
* Any regularizations and other techniques applied and their effects;


There is no need to write strict mathematical proofs (unless you want to).
 * "I tried this, this and this, and the second one turned out to be better. And i just didn't like the name of that one" - OK, but can be better
 * "I have analized these and these articles|sources|blog posts, tried that and that to adapt them to my problem and the conclusions are such and such" - the ideal one
 * "I took that code that demo without understanding it, but i'll never confess that and instead i'll make up some pseudoscientific explaination" - __not_ok__

### Hi, my name is `___ ___`, and here's my story

A long time ago in a galaxy far far away, when it was still more than an hour before the deadline, i got an idea:

##### I gonna build a neural network, that
* brief text on what was
* the original idea
* and why it was so

How could i be so naive?!

##### One day, with no signs of warning,
This thing has finally converged and
* Some explaination about what were the results,
* what worked and what didn't
* most importantly - what next steps were taken, if any
* and what were their respective outcomes

##### Finally, after __  iterations, __ mugs of [tea/coffee]
* what was the final architecture
* as well as training method and tricks

That, having wasted ____ [minutes, hours or days] of my life training, got

* accuracy on training: __
* accuracy on validation: __
* accuracy on test: __


[an optional afterword and mortal curses on assignment authors]