# Deep learning for computer vision


This notebook will teach you to build and train convolutional networks for image recognition. Brace yourselves.

# Tiny ImageNet dataset
This week, we shall focus on the image recognition problem on Tiny Image Net dataset
* 100k images of shape 3x64x64
* 200 different classes: snakes, spaiders, cats, trucks, grasshopper, gull, etc.


In [1]:
import torchvision
import torch
from torchvision import transforms

# if you're running in colab,
# 1. go to Runtime -> Change Runtimy Type -> GPU
# 2. uncomment this:
# !wget https://raw.githubusercontent.com/yandexdataschool/Practical_DL/spring2019/week03_convnets/tiny_img.py -O tiny_img.py


In [2]:
from tiny_img import download_tinyImg200
data_path = '.'
download_tinyImg200(data_path)

./tiny-imagenet-200.zip


In [3]:
dataset = torchvision.datasets.ImageFolder('tiny-imagenet-200/train', transform=transforms.ToTensor())
test_dataset = torchvision.datasets.ImageFolder('tiny-imagenet-200/val', transform=transforms.ToTensor())
train_dataset, val_dataset = torch.utils.data.random_split(dataset, [80000, 20000])
# test_dataset, val_dataset = torch.utils.data.random_split(val_dataset, [10000, 10000])

In [4]:
batch_size = 256
train_batch_gen = torch.utils.data.DataLoader(train_dataset, 
                                              batch_size=batch_size,
                                              shuffle=True,
                                              num_workers=1)

In [5]:
val_batch_gen = torch.utils.data.DataLoader(val_dataset, 
                                              batch_size=batch_size,
                                              shuffle=True,
                                              num_workers=1)

## Image examples ##



<tr>
    <td> <img src="tinyim3.png" alt="Drawing" style="width:90%"/> </td>
    <td> <img src="tinyim2.png" alt="Drawing" style="width:90%"/> </td>
</tr>


<tr>
    <td> <img src="tiniim.png" alt="Drawing" style="width:90%"/> </td>
</tr>

# Building a network

Simple neural networks with layers applied on top of one another can be implemented as `torch.nn.Sequential` - just add a list of pre-built modules and let it train.

In [6]:
import torch, torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable

# a special module that converts [batch, channel, w, h] to [batch, units]
class Flatten(nn.Module):
    def forward(self, input):
        return input.view(input.size(0), -1)

Let's start with a dense network for our baseline:

In [23]:
model = nn.Sequential()

# reshape from "images" to flat vectors
model.add_module('flatten', Flatten())

# dense "head"
model.add_module('dense1', nn.Linear(3 * 64 * 64, 1064))
model.add_module('dense1_relu', nn.ReLU())
model.add_module('dense2', nn.Linear(1064, 512))
model.add_module('dense2_relu', nn.ReLU())
model.add_module('dropout0', nn.Dropout(0.05)) 
model.add_module('dense3', nn.Linear(512, 256))
model.add_module('dense3_relu', nn.ReLU())
model.add_module('dropout1', nn.Dropout(0.05))
model.add_module('dense4', nn.Linear(256, 64))
model.add_module('dense4_relu', nn.ReLU())
model.add_module('dropout2', nn.Dropout(0.05))
model.add_module('dense5_logits', nn.Linear(64, 200)) # logits for 200 classes

As in our basic tutorial, we train our model with negative log-likelihood aka crossentropy.

In [24]:
with torch.cuda.device(1):
    model = model.cuda()

In [20]:
def compute_loss(X_batch, y_batch):
    with torch.cuda.device(1):
        X_batch = Variable(torch.FloatTensor(X_batch)).cuda()
        y_batch = Variable(torch.LongTensor(y_batch)).cuda()
    logits = model(X_batch)
    return F.cross_entropy(logits, y_batch)

### Training on minibatches
* We got 100k images, that's way too many for a full-batch SGD. Let's train on minibatches instead
* Below is a function that splits the training sample into minibatches

In [17]:
opt = torch.optim.Adam(model.parameters(), lr=0.001)

train_loss = []
val_accuracy = []

In [25]:
import numpy as np

opt = torch.optim.Adam(model.parameters(), lr=0.001)

train_loss = []
val_accuracy = []

In [26]:
num_epochs = 50 # total amount of full passes over training data

import time

for epoch in range(num_epochs):
    start_time = time.time()
    model.train(True) # enable dropout / batch_norm training behavior
    for (X_batch, y_batch) in train_batch_gen:
        # train on batch
        with torch.cuda.device(1):
            X = X_batch.cuda()
            y = y_batch.cuda()
            logits = model(X)
        loss = F.cross_entropy(logits, y)
        loss.backward()
        opt.step()
        opt.zero_grad()
        train_loss.append(loss.detach().cpu().numpy())
    
    model.train(False) # disable dropout / use averages for batch_norm
    for X_batch, y_batch in val_batch_gen:
        with torch.cuda.device(1):
            X = X_batch.cuda()
            logits = model(X)
        y_pred = logits.detach().cpu().numpy().argmax(axis=1)
        val_accuracy.append(np.mean((y_batch.cpu().numpy() == y_pred)))

    
    # Then we print the results for this epoch:
    print("Epoch {} of {} took {:.3f}s".format(
        epoch + 1, num_epochs, time.time() - start_time))
    print("  training loss (in-iteration): \t{:.6f}".format(
        np.mean(train_loss[-len(train_dataset) // batch_size :])))
    print("  validation accuracy: \t\t\t{:.2f} %".format(
        np.mean(val_accuracy[-len(val_dataset) // batch_size :]) * 100))

Epoch 1 of 50 took 33.290s
  training loss (in-iteration): 	5.301506
  validation accuracy: 			0.41 %
Epoch 2 of 50 took 33.080s
  training loss (in-iteration): 	5.299457
  validation accuracy: 			0.41 %
Epoch 3 of 50 took 33.399s
  training loss (in-iteration): 	5.298910
  validation accuracy: 			0.41 %
Epoch 4 of 50 took 33.263s
  training loss (in-iteration): 	5.298678
  validation accuracy: 			0.41 %
Epoch 5 of 50 took 33.160s
  training loss (in-iteration): 	5.298499
  validation accuracy: 			0.44 %
Epoch 6 of 50 took 32.949s
  training loss (in-iteration): 	5.298489
  validation accuracy: 			0.41 %
Epoch 7 of 50 took 33.334s
  training loss (in-iteration): 	5.298474
  validation accuracy: 			0.41 %
Epoch 8 of 50 took 33.465s
  training loss (in-iteration): 	5.298746
  validation accuracy: 			0.39 %


KeyboardInterrupt: 

Don't wait for full 100 epochs. You can interrupt training after 5-20 epochs once validation accuracy stops going up.
```

```

```

```

```

```

```

```

```

```

### Final test

In [29]:
model.train(False) # disable dropout / use averages for batch_norm
test_batch_acc = []
for X_batch, y_batch in val_batch_gen:
    with torch.cuda.device(1):
        logits = model(Variable(torch.FloatTensor(X_batch)).cuda())
    y_pred = logits.max(1)[1].data
    test_batch_acc.append(np.mean( (y_batch.cpu() == y_pred.cpu()).numpy() ))


test_accuracy = np.mean(test_batch_acc)
    
print("Final results:")
print("  test accuracy:\t\t{:.2f} %".format(
    test_accuracy * 100))

if test_accuracy * 100 > 70:
    print("U'r freakin' amazin'!")
elif test_accuracy * 100 > 50:
    print("Achievement unlocked: 110lvl Warlock!")
elif test_accuracy * 100 > 40:
    print("Achievement unlocked: 80lvl Warlock!")
elif test_accuracy * 100 > 30:
    print("Achievement unlocked: 70lvl Warlock!")
elif test_accuracy * 100 > 20:
    print("Achievement unlocked: 60lvl Warlock!")
else:
    print("We need more magic! Follow instructons below")

Exception ignored in: <bound method _DataLoaderIter.__del__ of <torch.utils.data.dataloader._DataLoaderIter object at 0x7f34abbc6be0>>
Traceback (most recent call last):
  File "/home/ya-philya/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 717, in __del__
    self._shutdown_workers()
  File "/home/ya-philya/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 713, in _shutdown_workers
    w.join()
  File "/home/ya-philya/anaconda3/lib/python3.6/multiprocessing/process.py", line 122, in join
    assert self._parent_pid == os.getpid(), 'can only join a child process'
AssertionError: can only join a child process


Final results:
  test accuracy:		0.39 %
We need more magic! Follow instructons below


## Task I: small convolution net
### First step

Let's create a mini-convolutional network with roughly such architecture:
* Input layer
* 3x3 convolution with 128 filters and _ReLU_ activation
* 2x2 pooling (or set previous convolution stride to 3)
* Flatten
* Dense layer with 1024 neurons and _ReLU_ activation
* 30% dropout
* Output dense layer.


__Convolutional layers__ in torch are just like all other layers, but with a specific set of parameters:

__`...`__

__`model.add_module('conv1', nn.Conv2d(in_channels=3, out_channels=128, kernel_size=3)) # convolution`__

__`model.add_module('pool1', nn.MaxPool2d(2)) # max pooling 2x2`__

__`...`__


Once you're done (and compute_loss no longer raises errors), train it with __Adam__ optimizer with default params (feel free to modify the code above).

If everything is right, you should get at least __16%__ validation accuracy.

__HACK_OF_THE_DAY__ :the number of channels must be in the order of the number of class_labels

In [16]:
model = nn.Sequential()

#decribe convnet here
model.add_module('conv1', nn.Conv2d(3, 128, [3, 3], padding=1))
model.add_module('conv_relu', nn.ReLU())
model.add_module('padd1', nn.MaxPool2d([2, 2], stride=2))
model.add_module('flatten', Flatten())
model.add_module('dense', nn.Linear(128 * 32 * 32, 1024))
model.add_module('dense_relu', nn.ReLU())
model.add_module('drop', nn.Dropout(p=0.3))
model.add_module('dense1_logits', nn.Linear(1024, 200)) # logits for 200 classes

In [17]:
with torch.cuda.device(1):
    model = model.cuda()
opt = torch.optim.Adam(model.parameters(), lr=0.001)

train_loss = []
val_accuracy = []

In [12]:
import torchsummary

ModuleNotFoundError: No module named 'torchsummary'

In [11]:
from torch import summary

summary(model, (3, 64, 64))

ImportError: cannot import name 'summary'

## retrain it ##

In [14]:
import numpy as np

In [18]:
import time
num_epochs = 100 # total amount of full passes over training data


for epoch in range(num_epochs):
    print (num_epochs)
    # In each epoch, we do a full pass over the training data:
    start_time = time.time()
    model.train(True) # enable dropout / batch_norm training behavior
    for (X_batch, y_batch) in train_batch_gen:
        # train on batch
        with torch.cuda.device(1):
            X = X_batch.cuda()
            y = y_batch.cuda()
            logits = model(X)
        loss = F.cross_entropy(logits, y)
        loss.backward()
        opt.step()
        opt.zero_grad()
        train_loss.append(loss.detach().cpu().numpy())
    print (num_epochs)    
    model.train(False) # disable dropout / use averages for batch_norm
    for X_batch, y_batch in val_batch_gen:
        with torch.cuda.device(1):
            logits = model(Variable(torch.FloatTensor(X_batch)).cuda())
        y_pred = logits.max(1)[1].data
        val_accuracy.append(np.mean( (y_batch.cpu() == y_pred.cpu()).numpy() ))

    print (num_epochs)
    # Then we print the results for this epoch:
    print("Epoch {} of {} took {:.3f}s".format(
        epoch + 1, num_epochs, time.time() - start_time))
    print("  training loss (in-iteration): \t{:.6f}".format(
        np.mean(train_loss[-len(train_dataset) // batch_size :])))
    print("  validation accuracy: \t\t\t{:.2f} %".format(
        np.mean(val_accuracy[-len(val_dataset) // batch_size :]) * 100))

100
100
100
Epoch 1 of 100 took 34.571s
  training loss (in-iteration): 	5.481599
  validation accuracy: 			2.48 %
100
100
100
Epoch 2 of 100 took 34.532s
  training loss (in-iteration): 	4.969859
  validation accuracy: 			4.65 %
100
100
100
Epoch 3 of 100 took 34.652s
  training loss (in-iteration): 	4.801440
  validation accuracy: 			6.71 %
100
100
Epoch 7 of 100 took 34.589s
  training loss (in-iteration): 	4.412679
  validation accuracy: 			10.06 %
100
100
100
Epoch 8 of 100 took 34.664s
  training loss (in-iteration): 	4.324869
  validation accuracy: 			11.30 %
100
100
100
Epoch 9 of 100 took 34.677s
  training loss (in-iteration): 	4.232429
  validation accuracy: 			11.57 %
100
100
100
Epoch 10 of 100 took 34.726s
  training loss (in-iteration): 	4.128664
  validation accuracy: 			11.72 %
100
100
100
Epoch 11 of 100 took 34.469s
  training loss (in-iteration): 	4.038498
  validation accuracy: 			11.83 %
100
100
100
Epoch 12 of 100 took 34.859s
  training loss (in-iteration): 	3.9

Traceback (most recent call last):
  File "/home/ya-philya/anaconda3/lib/python3.6/multiprocessing/queues.py", line 240, in _feed
    send_bytes(obj)
  File "/home/ya-philya/anaconda3/lib/python3.6/multiprocessing/connection.py", line 200, in send_bytes
    self._send_bytes(m[offset:offset + size])
  File "/home/ya-philya/anaconda3/lib/python3.6/multiprocessing/connection.py", line 404, in _send_bytes
    self._send(header + buf)
  File "/home/ya-philya/anaconda3/lib/python3.6/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe


KeyboardInterrupt: 

```

```

```

```

```

```

```

```

```

```

__Hint:__ If you don't want to compute shapes by hand, just plug in any shape (e.g. 1 unit) and run compute_loss. You will see something like this:

__`RuntimeError: size mismatch, m1: [5 x 1960], m2: [1 x 64] at /some/long/path/to/torch/operation`__

See the __1960__ there? That's your actual input shape.

## Task 2: adding normalization

* Add batch norm (with default params) between convolution and ReLU
  * nn.BatchNorm*d (1d for dense, 2d for conv)
  * usually better to put them after linear/conv but before nonlinearity
* Re-train the network with the same optimizer, it should get at least 20% validation accuracy at peak.

To know more about **batch_norm** and **data covariate shift**

https://towardsdatascience.com/batch-normalization-in-neural-networks-1ac91516821c

https://www.youtube.com/watch?v=nUUqwaxLnWs

In [22]:
model = nn.Sequential()

#decribe convnet here
model.add_module('conv1', nn.Conv2d(3, 128, [3, 3], padding=1))
model.add_module('conv_relu', nn.ReLU())
model.add_module('batch_norm', nn.BatchNorm2d(128))
model.add_module('padd1', nn.MaxPool2d([2, 2], stride=2))
model.add_module('flatten', Flatten())
model.add_module('dense', nn.Linear(128 * 32 * 32, 1024))
model.add_module('dense_relu', nn.ReLU())
model.add_module('drop', nn.Dropout(p=0.2))
model.add_module('dense1_logits', nn.Linear(1024, 200)) # logits for 200 classes

In [23]:
with torch.cuda.device(1):
    model = model.cuda()
opt = torch.optim.Adam(model.parameters(), lr=0.001)

train_loss = []
val_accuracy = []

In [24]:
import time
num_epochs = 100 # total amount of full passes over training data


for epoch in range(num_epochs):
    print (num_epochs)
    # In each epoch, we do a full pass over the training data:
    start_time = time.time()
    model.train(True) # enable dropout / batch_norm training behavior
    for (X_batch, y_batch) in train_batch_gen:
        # train on batch
        with torch.cuda.device(1):
            X = X_batch.cuda()
            y = y_batch.cuda()
            logits = model(X)
        loss = F.cross_entropy(logits, y)
        loss.backward()
        opt.step()
        opt.zero_grad()
        train_loss.append(loss.detach().cpu().numpy())
    print (num_epochs)    
    model.train(False) # disable dropout / use averages for batch_norm
    for X_batch, y_batch in val_batch_gen:
        with torch.cuda.device(1):
            logits = model(Variable(torch.FloatTensor(X_batch)).cuda())
        y_pred = logits.max(1)[1].data
        val_accuracy.append(np.mean( (y_batch.cpu() == y_pred.cpu()).numpy() ))

    print (num_epochs)
    # Then we print the results for this epoch:
    print("Epoch {} of {} took {:.3f}s".format(
        epoch + 1, num_epochs, time.time() - start_time))
    print("  training loss (in-iteration): \t{:.6f}".format(
        np.mean(train_loss[-len(train_dataset) // batch_size :])))
    print("  validation accuracy: \t\t\t{:.2f} %".format(
        np.mean(val_accuracy[-len(val_dataset) // batch_size :]) * 100))

100
100
100
Epoch 1 of 100 took 38.755s
  training loss (in-iteration): 	5.446692
  validation accuracy: 			7.24 %
100
100
100
Epoch 2 of 100 took 38.697s
  training loss (in-iteration): 	4.451503
  validation accuracy: 			9.40 %
100
100
100
Epoch 3 of 100 took 38.812s
  training loss (in-iteration): 	4.096172
  validation accuracy: 			11.65 %
100
100
100
Epoch 4 of 100 took 38.931s
  training loss (in-iteration): 	3.723799
  validation accuracy: 			11.82 %
100
100
100
Epoch 5 of 100 took 38.896s
  training loss (in-iteration): 	3.339520
  validation accuracy: 			11.49 %
100
100
100
Epoch 6 of 100 took 38.948s
  training loss (in-iteration): 	2.958177
  validation accuracy: 			11.06 %
100
100
100
Epoch 7 of 100 took 38.851s
  training loss (in-iteration): 	2.607243
  validation accuracy: 			11.20 %
100
100
100
Epoch 8 of 100 took 38.890s
  training loss (in-iteration): 	2.295814
  validation accuracy: 			10.58 %
100
100
100
Epoch 9 of 100 took 38.897s
  training loss (in-iteration): 	2

Traceback (most recent call last):
  File "/home/ya-philya/anaconda3/lib/python3.6/multiprocessing/queues.py", line 240, in _feed
    send_bytes(obj)
  File "/home/ya-philya/anaconda3/lib/python3.6/multiprocessing/connection.py", line 200, in send_bytes
    self._send_bytes(m[offset:offset + size])
  File "/home/ya-philya/anaconda3/lib/python3.6/multiprocessing/connection.py", line 404, in _send_bytes
    self._send(header + buf)
  File "/home/ya-philya/anaconda3/lib/python3.6/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe


KeyboardInterrupt: 


```

```

```

```

```

```

```

```

```

```

```

```

```

```
## Task 3: Data Augmentation

** Augmenti - A spell used to produce water from a wand (Harry Potter Wiki) **

<img src="HagridsHut_PM_B6C28_Hagrid_sHutFireHarryFang.jpg" style="width:80%">

There's a powerful torch tool for image preprocessing useful to do data preprocessing and augmentation.

Here's how it works: we define a pipeline that
* makes random crops of data (augmentation)
* randomly flips image horizontally (augmentation)
* then normalizes it (preprocessing)

When testing, we don't need random crops, just normalize with same statistics.

In [25]:
import torchvision
from torchvision import transforms
means = np.array((0.4914, 0.4822, 0.4465))
stds = np.array((0.2023, 0.1994, 0.2010))

transform_augment = transforms.Compose([
    transforms.RandomCrop(32),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize(means, stds)
])

In [26]:
dataset = torchvision.datasets.ImageFolder('tiny-imagenet-200/train', transform=transform_augment)

In [27]:
train_dataset, val_dataset = torch.utils.data.random_split(dataset, [90000, 10000])

In [28]:
batch_size = 1024
train_batch_gen = torch.utils.data.DataLoader(train_dataset, 
                                              batch_size=batch_size,
                                              shuffle=True,
                                              num_workers=1)

In [29]:
val_batch_gen = torch.utils.data.DataLoader(val_dataset, 
                                              batch_size=batch_size,
                                              shuffle=True,
                                              num_workers=1)

In [33]:
model = nn.Sequential()

#decribe convnet here
model.add_module('conv1', nn.Conv2d(3, 128, [3, 3], padding=1))
model.add_module('conv_relu', nn.ReLU())
model.add_module('batch_norm', nn.BatchNorm2d(128))
model.add_module('padd1', nn.MaxPool2d([2, 2], stride=2))
model.add_module('flatten', Flatten())
model.add_module('dense', nn.Linear(32768, 1024))
model.add_module('dense_relu', nn.ReLU())
model.add_module('drop', nn.Dropout(p=0.2))
model.add_module('dense1_logits', nn.Linear(1024, 200)) # logits for 200 classes

In [34]:
with torch.cuda.device(1):
    model = model.cuda()
opt = torch.optim.Adam(model.parameters(), lr=0.001)

train_loss = []
val_accuracy = []

In [46]:
import time
num_epochs = 100 # total amount of full passes over training data


for epoch in range(num_epochs):
    print (num_epochs)
    # In each epoch, we do a full pass over the training data:
    start_time = time.time()
    model.train(True) # enable dropout / batch_norm training behavior
    for (X_batch, y_batch) in train_batch_gen:
        # train on batch
        with torch.cuda.device(1):
            X = X_batch.cuda()
            y = y_batch.cuda()
            logits = model(X)
        loss = F.cross_entropy(logits, y)
        loss.backward()
        opt.step()
        opt.zero_grad()
        train_loss.append(loss.detach().cpu().numpy())
    print (num_epochs)    
    model.train(False) # disable dropout / use averages for batch_norm
    for X_batch, y_batch in val_batch_gen:
        with torch.cuda.device(1):
            logits = model(Variable(torch.FloatTensor(X_batch)).cuda())
        y_pred = logits.max(1)[1].data
        val_accuracy.append(np.mean( (y_batch.cpu() == y_pred.cpu()).numpy() ))

    print (num_epochs)
    # Then we print the results for this epoch:
    print("Epoch {} of {} took {:.3f}s".format(
        epoch + 1, num_epochs, time.time() - start_time))
    print("  training loss (in-iteration): \t{:.6f}".format(
        np.mean(train_loss[-len(train_dataset) // batch_size :])))
    print("  validation accuracy: \t\t\t{:.2f} %".format(
        np.mean(val_accuracy[-len(val_dataset) // batch_size :]) * 100))

100


RuntimeError: CUDA out of memory. Tried to allocate 2.00 GiB (GPU 1; 10.92 GiB total capacity; 8.99 GiB already allocated; 1015.50 MiB free; 405.37 MiB cached)

We need for test data __only normalization__, not cropping and rotation

In [40]:
val_dataset.dataset.transform = transform_augment
val_dataset.dataset.transform

Compose(
    RandomCrop(size=(32, 32), padding=0)
    RandomHorizontalFlip(p=0.5)
    ToTensor()
    Normalize(mean=[0.4914 0.4822 0.4465], std=[0.2023 0.1994 0.201 ])
)

In [43]:
transform_test = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize(means, stds),
])
val_dataset.dataset.transform = transform_test

In [44]:
val_dataset.dataset.transform, test_dataset.dataset.transform

(Compose(
     ToTensor()
     Normalize(mean=[0.4914 0.4822 0.4465], std=[0.2023 0.1994 0.201 ])
 ), Compose(
     ToTensor()
     Normalize(mean=[0.4914 0.4822 0.4465], std=[0.2023 0.1994 0.201 ])
 ))

## The Quest For A Better Network

See `practical_dl/homework02` for a full-scale assignment.