<a href="https://colab.research.google.com/github/tanyaoley/ml_course_homework/blob/master/Copy_of_homework2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Homework 2.2: The Quest For A Better Network

In this assignment you will build a monster network to solve Tiny ImageNet image classification.

This notebook is intended as a sequel to seminar 3, please give it a try if you haven't done so yet.

(please read it at least diagonally)

* The ultimate quest is to create a network that has as high __accuracy__ as you can push it.
* There is a __mini-report__ at the end that you will have to fill in. We recommend reading it first and filling it while you iterate.
 
## Grading
* starting at zero points
* +20% for describing your iteration path in a report below.
* +20% for building a network that gets above 20% accuracy
* +10% for beating each of these milestones on __TEST__ dataset:
    * 25% (50% points)
    * 30% (60% points)
    * 32.5% (70% points)
    * 35% (80% points)
    * 37.5% (90% points)
    * 40% (full points)
    
## Restrictions
* Please do NOT use pre-trained networks for this assignment until you reach 40%.
 * In other words, base milestones must be beaten without pre-trained nets (and such net must be present in the anytask atttachments). After that, you can use whatever you want.
* you __can't__ do anything with validation data apart from running the evaluation procedure. Please, split train images on train and validation parts

## Tips on what can be done:


 * __Network size__
   * MOAR neurons, 
   * MOAR layers, ([torch.nn docs](http://pytorch.org/docs/master/nn.html))

   * Nonlinearities in the hidden layers
     * tanh, relu, leaky relu, etc
   * Larger networks may take more epochs to train, so don't discard your net just because it could didn't beat the baseline in 5 epochs.

   * Ph'nglui mglw'nafh Cthulhu R'lyeh wgah'nagl fhtagn!


### The main rule of prototyping: one change at a time
   * By now you probably have several ideas on what to change. By all means, try them out! But there's a catch: __never test several new things at once__.


### Optimization
   * Training for 100 epochs regardless of anything is probably a bad idea.
   * Some networks converge over 5 epochs, others - over 500.
   * Way to go: stop when validation score is 10 iterations past maximum
   * You should certainly use adaptive optimizers
     * rmsprop, nesterov_momentum, adam, adagrad and so on.
     * Converge faster and sometimes reach better optima
     * It might make sense to tweak learning rate/momentum, other learning parameters, batch size and number of epochs
   * __BatchNormalization__ (nn.BatchNorm2d) for the win!
     * Sometimes more batch normalization is better.
   * __Regularize__ to prevent overfitting
     * Add some L2 weight norm to the loss function, PyTorch will do the rest
       * Can be done manually or like [this](https://discuss.pytorch.org/t/simple-l2-regularization/139/2).
     * Dropout (`nn.Dropout`) - to prevent overfitting
       * Don't overdo it. Check if it actually makes your network better
   
### Convolution architectures
   * This task __can__ be solved by a sequence of convolutions and poolings with batch_norm and ReLU seasoning, but you shouldn't necessarily stop there.
   * [Inception family](https://hacktilldawn.com/2016/09/25/inception-modules-explained-and-implemented/), [ResNet family](https://towardsdatascience.com/an-overview-of-resnet-and-its-variants-5281e2f56035?gi=9018057983ca), [Densely-connected convolutions (exotic)](https://arxiv.org/abs/1608.06993), [Capsule networks (exotic)](https://arxiv.org/abs/1710.09829)
   * Please do try a few simple architectures before you go for resnet-152.
   * Warning! Training convolutional networks can take long without GPU. That's okay.
     * If you are CPU-only, we still recomment that you try a simple convolutional architecture
     * a perfect option is if you can set it up to run at nighttime and check it up at the morning.
     * Make reasonable layer size estimates. A 128-neuron first convolution is likely an overkill.
     * __To reduce computation__ time by a factor in exchange for some accuracy drop, try using __stride__ parameter. A stride=2 convolution should take roughly 1/4 of the default (stride=1) one.
 
   
### Data augmemntation
   * getting 5x as large dataset for free is a great 
     * Zoom-in+slice = move
     * Rotate+zoom(to remove black stripes)
     * Add Noize (gaussian or bernoulli)
   * Simple way to do that (if you have PIL/Image): 
     * ```from scipy.misc import imrotate,imresize```
     * and a few slicing
     * Other cool libraries: cv2, skimake, PIL/Pillow
   * A more advanced way is to use torchvision transforms:
    ```
    transform_train = transforms.Compose([
        transforms.RandomCrop(32, padding=4),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
    ])
    trainset = torchvision.datasets.ImageFolder(root=path_to_tiny_imagenet, train=True, download=True, transform=transform_train)
    trainloader = torch.utils.data.DataLoader(trainset, batch_size=128, shuffle=True, num_workers=2)

    ```
   * Or use this tool from Keras (requires theano/tensorflow): [tutorial](https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html), [docs](https://keras.io/preprocessing/image/)
   * Stay realistic. There's usually no point in flipping dogs upside down as that is not the way you usually see them.
   


In [0]:
import numpy as np
import torchvision
import torch
from torchvision import transforms
import torch, torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable


%matplotlib inline


In [0]:
import torch
import torchvision
from torchvision import transforms
means = np.array((0.4914, 0.4822, 0.4465))
stds = np.array((0.2023, 0.1994, 0.2010))

transform_train_val = transforms.Compose([
    transforms.RandomRotation(degrees = 30),
    transforms.RandomHorizontalFlip(p = 0.5),
    transforms.CenterCrop(size = 64), 
    transforms.ToTensor(), 
    transforms.Normalize(means, stds)
])





In [13]:
!git clone https://github.com/seshuad/IMagenet

dataset = torchvision.datasets.ImageFolder('IMagenet/tiny-imagenet-200/train', transform = transform_train_val)

test_dataset = torchvision.datasets.ImageFolder('IMagenet/tiny-imagenet-200/val', transform=transforms.ToTensor())

train_dataset, val_dataset = torch.utils.data.random_split(dataset, [80000, 20000])


batch_size = 32

train_batch = torch.utils.data.DataLoader(train_dataset, 
                                              batch_size=batch_size,
                                              shuffle=True,
                                              num_workers=2)
val_batch = torch.utils.data.DataLoader(val_dataset, 
                                              batch_size=batch_size,
                                              shuffle=True,
                                              num_workers=2)
test_batch = torch.utils.data.DataLoader(test_dataset, 
                                              batch_size=batch_size,
                                              shuffle=True,
                                              num_workers=2)

fatal: destination path 'IMagenet' already exists and is not an empty directory.


In [0]:
class Flatten(nn.Module):
    def forward(self, input):
        return input.view(input.size(0), -1)

In [0]:
np.random.seed(42)
torch.manual_seed(42)
torch.backends.cudnn.deterministic = True 
torch.backends.cudnn.benchmark = False 


model =  nn.Sequential(
            nn.Conv2d(in_channels=3, out_channels=128, kernel_size=3), #3 64 64 - 8 62 62

            nn.MaxPool2d(2), 
            nn.ReLU(),
            nn.BatchNorm2d(128),
            nn.Dropout(p = 0.3),
            nn.Conv2d(in_channels=128, out_channels=128, kernel_size=3),
            nn.MaxPool2d(2),
            nn.ReLU(),
            nn.BatchNorm2d(128),
            nn.Dropout(p = 0.3),
            nn.Conv2d(in_channels=128, out_channels=256, kernel_size=3),
            nn.MaxPool2d(2),   
            nn.ReLU(),
            nn.BatchNorm2d(256),
            nn.Dropout(p = 0.3),
            nn.Conv2d(in_channels=256, out_channels=512, kernel_size=3),
            #nn.Conv2d(in_channels=16, out_channels=32, kernel_size=3),
            nn.MaxPool2d(2),
           
            nn.ReLU(),
            nn.BatchNorm2d(512),
            Flatten(),
            nn.Dropout(p = 0.3),
            nn.Linear(2048, 512),
            nn.ReLU(),
            nn.Dropout(p = 0.3),
            
            nn.Linear(512,200)

            )

In [0]:
def compute_loss(X_batch, y_batch):
    X_batch = Variable(torch.FloatTensor(X_batch)).cuda()
    y_batch = Variable(torch.LongTensor(y_batch)).cuda()
    logits = model.cuda()(X_batch)
    
    loss = F.cross_entropy(logits, y_batch).mean()
    

    return loss

In [31]:
from torch.optim import Adam
opt = Adam(model.parameters(),
           lr = 1e-3, 
           weight_decay = 1e-4) 

train_loss = []
val_accuracy = []

import numpy as np



train_loss = []
val_accuracy = []

num_epochs = 40 

import time

for epoch in range(num_epochs):
    start_time = time.time()
    model.train(True) 
    for (X_batch, y_batch) in train_batch:
     
        loss = compute_loss(X_batch, y_batch)

        loss.backward()
        opt.step()
        opt.zero_grad()
        train_loss.append(loss.cpu().data.numpy())
    model.eval()
  
    for X_batch, y_batch in val_batch:
        logits = model(Variable(torch.FloatTensor(X_batch)).cuda())
        y_pred = logits.max(1)[1].data
        val_accuracy.append(np.mean( (y_batch.cpu() == y_pred.cpu()).numpy() ))

    print("Epoch {} of {} took {:.3f}s".format(
        epoch + 1, num_epochs, time.time() - start_time))
    print("  training loss (in-iteration): \t{:.6f}".format(
        np.mean(train_loss[-len(train_dataset) // batch_size :])))
    print("  validation accuracy: \t\t\t{:.2f} %".format(
        np.mean(val_accuracy[-len(val_dataset) // batch_size :]) * 100))

Epoch 1 of 40 took 62.333s
  training loss (in-iteration): 	4.624020
  validation accuracy: 			13.37 %
Epoch 2 of 40 took 61.654s
  training loss (in-iteration): 	4.080802
  validation accuracy: 			17.91 %
Epoch 3 of 40 took 63.099s
  training loss (in-iteration): 	3.847533
  validation accuracy: 			20.77 %
Epoch 4 of 40 took 62.322s
  training loss (in-iteration): 	3.670362
  validation accuracy: 			23.62 %
Epoch 5 of 40 took 61.247s
  training loss (in-iteration): 	3.551527
  validation accuracy: 			24.38 %
Epoch 6 of 40 took 61.027s
  training loss (in-iteration): 	3.450377
  validation accuracy: 			26.80 %
Epoch 7 of 40 took 61.006s
  training loss (in-iteration): 	3.375511
  validation accuracy: 			27.79 %
Epoch 8 of 40 took 61.022s
  training loss (in-iteration): 	3.307302
  validation accuracy: 			29.18 %
Epoch 9 of 40 took 63.470s
  training loss (in-iteration): 	3.262775
  validation accuracy: 			30.21 %
Epoch 10 of 40 took 62.686s
  training loss (in-iteration): 	3.217117
  v

In [35]:
num_epochs  = 2
opt = Adam(model.parameters(),
           lr = 1e-5,  
           weight_decay = 1e-4)
for epoch in range(num_epochs):
    start_time = time.time()
    model.train(True) 
    for (X_batch, y_batch) in train_batch:
        # train on batch
        loss = compute_loss(X_batch, y_batch)
        loss.backward()
        opt.step()
        opt.zero_grad()
        train_loss.append(loss.cpu().data.numpy())
    

    model.eval()
    for X_batch, y_batch in val_batch:
        logits = model(Variable(torch.FloatTensor(X_batch)).cuda())
        y_pred = logits.max(1)[1].data
        val_accuracy.append(np.mean( (y_batch.cpu() == y_pred.cpu()).numpy() ))

    
 
    print("Epoch {} of {} took {:.3f}s".format(
        epoch + 1, num_epochs, time.time() - start_time))
    print("  training loss (in-iteration): \t{:.6f}".format(
        np.mean(train_loss[-len(train_dataset) // batch_size :])))
    print("  validation accuracy: \t\t\t{:.2f} %".format(
        np.mean(val_accuracy[-len(val_dataset) // batch_size :]) * 100))
    


Epoch 1 of 2 took 61.366s
  training loss (in-iteration): 	2.437027
  validation accuracy: 			40.47 %
Epoch 2 of 2 took 61.898s
  training loss (in-iteration): 	2.439510
  validation accuracy: 			40.18 %


In [0]:
import torch
import torch.nn as nn
import torchvision.models as models
from torch.autograd import Variable
model = models.resnet152(pretrained=True)

In [14]:
from torch.optim import Adam
opt = Adam(model.parameters(),
           lr = 1e-3, 
           weight_decay = 1e-4) 

train_loss = []
val_accuracy = []

import numpy as np



train_loss = []
val_accuracy = []

num_epochs = 10 

import time

for epoch in range(num_epochs):
    start_time = time.time()
    model.train(True) 
    for (X_batch, y_batch) in train_batch:
     
        loss = compute_loss(X_batch, y_batch)

        loss.backward()
        opt.step()
        opt.zero_grad()
        train_loss.append(loss.cpu().data.numpy())
    model.eval()
  
    for X_batch, y_batch in val_batch:
        logits = model(Variable(torch.FloatTensor(X_batch)).cuda())
        y_pred = logits.max(1)[1].data
        val_accuracy.append(np.mean( (y_batch.cpu() == y_pred.cpu()).numpy() ))

    print("Epoch {} of {} took {:.3f}s".format(
        epoch + 1, num_epochs, time.time() - start_time))
    print("  training loss (in-iteration): \t{:.6f}".format(
        np.mean(train_loss[-len(train_dataset) // batch_size :])))
    print("  validation accuracy: \t\t\t{:.2f} %".format(
        np.mean(val_accuracy[-len(val_dataset) // batch_size :]) * 100))

Epoch 1 of 10 took 434.115s
  training loss (in-iteration): 	3.903686
  validation accuracy: 			18.48 %
Epoch 2 of 10 took 431.525s
  training loss (in-iteration): 	3.548187
  validation accuracy: 			22.62 %
Epoch 3 of 10 took 428.178s
  training loss (in-iteration): 	3.338513
  validation accuracy: 			25.34 %
Epoch 4 of 10 took 428.084s
  training loss (in-iteration): 	3.187649
  validation accuracy: 			25.02 %


KeyboardInterrupt: ignored

When everything is done, please calculate accuracy on `tiny-imagenet-200/val`

In [0]:
test_acc=[]
for X_batch, y_batch in test_batch:
        logits = model(Variable(torch.FloatTensor(X_batch)).cuda())
        y_pred = logits.max(1)[1].data
        test_acc.append(np.mean( (y_batch.cpu() == y_pred.cpu()).numpy() ))
        #print(np.mean( (y_batch.cpu() == y_pred.cpu()).numpy()))
test_accuracy = np.mean(test_acc)

In [16]:
print("Final results:")
print("  test accuracy:\t\t{:.2f} %".format(
    test_accuracy * 100))

if test_accuracy * 100 > 40:
    print("Achievement unlocked: 110lvl Warlock!")
elif test_accuracy * 100 > 35:
    print("Achievement unlocked: 80lvl Warlock!")
elif test_accuracy * 100 > 30:
    print("Achievement unlocked: 70lvl Warlock!")
elif test_accuracy * 100 > 25:
    print("Achievement unlocked: 60lvl Warlock!")
else:
    print("We need more magic! Follow instructons below")

Final results:
  test accuracy:		0.39 %
We need more magic! Follow instructons below


**Отчет**

Начала обучение с сети из 3 линейных слоев, после 13 эпох точность была 3 процента        

С добавлением conv слоя стало 14 процентов на 10 эпохе 
заменяла relu на leakyrelu - результат не стал сильно лучше(хотя часто отмечают, что leakyrelu лучше чем relu, так как он не зануляет градиент)

добавление батч нормализации сильно улучшило результаты, так как она предотвращает уменьшение градиента (все слои получают градиент в одинаковой степени)

с трансформацией картинок качество стало сильно лучше - так как трансформация помогает избежать переобучения

30,61 with maxpool было при обучении, затем я заменила это на avgpool - результаты стали немного хуже(тк maxpool извлекает более важные признаки, что подходит для изображений). 

Кроме этого добавляла больше conv слоев - качество повышалось, после этого увеличивала значения выходных и входных значений (каналов) - увеличивала число фильтров, за счет этого точность на валидации стала выше 

в итоге обучила 40 эпох с полученной сетью, затем уменьшила learning rate и обучила еще 17 эпох(40% accuracy на валидации как результат)(Это уменьшение сильно изменяет веса в начале обучения, но делает небольшие изменения весов в конце обучения)


```

```

```

```

```

```


# Report

All creative approaches are highly welcome, but at the very least it would be great to mention
* the idea;
* brief history of tweaks and improvements;
* what is the final architecture and why?
* what is the training method and, again, why?
* Any regularizations and other techniques applied and their effects;


There is no need to write strict mathematical proofs (unless you want to).
 * "I tried this, this and this, and the second one turned out to be better. And i just didn't like the name of that one" - OK, but can be better
 * "I have analized these and these articles|sources|blog posts, tried that and that to adapt them to my problem and the conclusions are such and such" - the ideal one
 * "I took that code that demo without understanding it, but i'll never confess that and instead i'll make up some pseudoscientific explaination" - __not_ok__

### Hi, my name is `___ ___`, and here's my story

A long time ago in a galaxy far far away, when it was still more than an hour before the deadline, i got an idea:

##### I gonna build a neural network, that
* brief text on what was
* the original idea
* and why it was so

How could i be so naive?!

##### One day, with no signs of warning,
This thing has finally converged and
* Some explaination about what were the results,
* what worked and what didn't
* most importantly - what next steps were taken, if any
* and what were their respective outcomes

##### Finally, after __  iterations, __ mugs of [tea/coffee]
* what was the final architecture
* as well as training method and tricks

That, having wasted ____ [minutes, hours or days] of my life training, got

* accuracy on training: __
* accuracy on validation: __
* accuracy on test: __


[an optional afterword and mortal curses on assignment authors]