<a href="https://colab.research.google.com/github/telecombcn-dl/2018-dlai-team10/blob/master/CNN.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#**Convolutional Neural Network**
The problem we are trying to solve here is to classify grayscale images of handwritten objects (28 pixels by 28 pixels), into 10 categories (apple, banana, fork...). The dataset we will use is extracted from the Kaggle competition: **Quick Draw! Doodle Recognition Challenge**. 

In this notebook, we will approach this task by implementing a **Convolutional Neural Network**. For this project we have also implemented other two approaches (Multilayer Perceptron and Long-Short Term Memory Network), that also have a corresponding self-contained notebooks. 

Our motivation to tackle this problem of image classification using a CNN (Convolutional Neural Network) is quite obvious, because it is a specialized kind of neural network for processing data that has a known grid-like topology that leverages the ideas of local connectivity, parameter sharing and pooling/subsampling hidden units. 

*The basic idea behind a CNN is that the network learns hierarchical representations  of the data with increasing levels of abstraciton.*



*For more details about out project please visit: https://telecombcn-dl.github.io/2018-dlai-team10/ *

#**1. Notebook Setting**

In this section we will import Pytorch and some relevant Python libraries (Numpy, Matplotlib...) that will later be used. Additionally, we will set the notebook environment to train on the GPU to obtain faster results. 

In [0]:
from os.path import exists
from wheel.pep425tags import get_abbr_impl, get_impl_ver, get_abi_tag
platform = '{}{}-{}'.format(get_abbr_impl(), get_impl_ver(), get_abi_tag())
cuda_output = !ldconfig -p|grep cudart.so|sed -e 's/.*\.\([0-9]*\)\.\([0-9]*\)$/cu\1\2/'
accelerator = cuda_output[0] if exists('/dev/nvidia0') else 'cpu'
!pip install -q http://download.pytorch.org/whl/{accelerator}/torch-0.4.1-{platform}-linux_x86_64.whl torchvision
  
import numpy as np
import os
import sys
import torch
import torchvision
import random
import codecs
import torch.utils.data
import torch.optim as optim
import torchvision.datasets as datasets
import torch.nn as nn
import torch.nn.functional as F
from matplotlib import pyplot as plt
from PIL import Image
from tqdm import tqdm
from torchvision import datasets, transforms

#Training on the GPU
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(device)

print('Done!')

# **2. Dataset Preparation**

In this section we will download a part of the original dataset, we will reduce the number of samples, distribute them in training, validation and test, reshape them into images and organize them in a structured way. 

## **2.1 Dataset Download**

The dataset is downloaded from the Google APIs and it comes in the form of a set of Numpy arrays. The Quick! Draw challenge dataset actually contains more than 300 classes, however we will only use 10 of them for our project, for a simplification purpose. We have manually selected the classes we will work with in order to have some interesting inter-class variability (wheeel and pizza are very similar while apple is very different...).

The classes we will try to classify are the following ones:


![clases](https://user-images.githubusercontent.com/43316350/50059288-43b93480-0185-11e9-8d9f-76695e781a8b.JPG) 

In [0]:
  urls = [
        'https://storage.googleapis.com/quickdraw_dataset/full/numpy_bitmap/key.npy',
        'https://storage.googleapis.com/quickdraw_dataset/full/numpy_bitmap/banana.npy',
        'https://storage.googleapis.com/quickdraw_dataset/full/numpy_bitmap/ladder.npy',
        'https://storage.googleapis.com/quickdraw_dataset/full/numpy_bitmap/tennis%20racquet.npy',
        'https://storage.googleapis.com/quickdraw_dataset/full/numpy_bitmap/pizza.npy',
        'https://storage.googleapis.com/quickdraw_dataset/full/numpy_bitmap/stop%20sign.npy',
        'https://storage.googleapis.com/quickdraw_dataset/full/numpy_bitmap/wheel.npy',
        'https://storage.googleapis.com/quickdraw_dataset/full/numpy_bitmap/fork.npy',
        'https://storage.googleapis.com/quickdraw_dataset/full/numpy_bitmap/book.npy',
        'https://storage.googleapis.com/quickdraw_dataset/full/numpy_bitmap/apple.npy',
    ]
  
  class_name = ['key', 'banana', 'ladder', 'tennis_racquet', 'pizza', 'stop_sign', 'wheel', 'fork', 'book', 'apple']
   
  def createDir(path):
    if not os.path.exists(path):
        os.makedirs(path)
    
  def gen_bar_updater(pbar):
    def bar_update(count, block_size, total_size):
        if pbar.total is None and total_size:
            pbar.total = total_size
        progress_bytes = count * block_size
        pbar.update(progress_bytes - pbar.n)
    return bar_update   
    
  def download_url(url, root, filename):
      from six.moves import urllib
      root = os.path.expanduser(root)
      fpath = os.path.join(root, filename + ".npy")

      createDir(root)

      # downloads file
      if os.path.isfile(fpath):
          a = 1
          #print('Using downloaded and verified file: ' + fpath)
      else:
          try:
              print('Downloading ' + url + ' to ' + fpath)
              urllib.request.urlretrieve(
                  url, fpath,
                  reporthook = gen_bar_updater(tqdm(unit='B', unit_scale=True))
              )
          except OSError:
              if url[:5] == 'https':
                  url = url.replace('https:', 'http:')
                  print('Failed download. Trying https -> http instead.'
                        ' Downloading ' + url + ' to ' + fpath)
                  urllib.request.urlretrieve(
                      url, fpath,
                      reporthook = gen_bar_updater(tqdm(unit='B', unit_scale=True))
                  )
                  
                  
                  
  for i in range(0, len(urls)):
    download_url(urls[i], "data", class_name[i])
    
    
  print("Done!")   

## **2.2 Dataset Reduction, Reshaping and Reorganization**
As we are implementing a CNN (we are willing to exploit the local connectivity of the data), we want to have the data as images. Furthermore, we have decided to work with a reduced dataset, so the number of samples per class will be *max_length*. We also split the data into training, validation and test by the percentages defined by *percen* and place each sample in its corresponding folder. 

![datapercentage](https://user-images.githubusercontent.com/43316350/50059513-20dc4f80-0188-11e9-9c09-5e798d97cd96.JPG)


In [0]:
class_name = ['apple', 'banana', 'book', 'fork', 'key', 'ladder', 'pizza', 'stop_sign', 'tennis_racquet', 'wheel']
step = ['train', 'validation', 'test']

dire = r'data/'

max_length = 10000 # Maximum number of files (drawings) per class
percen=[0.6, 0.3, 0.1] # Percentage of training, validation and testing

begin = [0, int(max_length * percen[0]), int(max_length * (percen[0] + percen[1])) ]
end = [int(max_length * (percen[0])), int(max_length * (percen[0] + percen[1])) , max_length-10]

for c in range(0, len(class_name)):
  print('Class ' + str(c+1) + ' out of ' + str(len(class_name)))
  filename = dire + str(class_name[c]) + '.npy'
  data = np.load(filename)
  
  for s in range(0, len(step)):
    dire_step = str(dire) + str(step[s])
    if not os.path.exists(dire_step):
      os.makedirs(dire_step)
    
    for i in range(begin[s], end[s]):
      dire_class = str(dire_step) + '/' + str(class_name[c])
      if not os.path.exists(dire_class):
        os.makedirs(dire_class)
      
      # Reshape the raw data into 28x28 images
      data_sample = data[i,:].reshape((28, 28))
      sample_name = class_name[c] + '_' + str(step[s]) + '_' + str(i)
      np.save(os.path.join(dire_class, sample_name), data_sample)

print('Done!')

## **2.3 Data Visualization**

An interesting experiment (and validation step) we can do is to randomly visualize an image corresponding to the training set of images of the selected class. 

In [0]:
drawing_class = 0  # 0-apple, 1-banana, 2-book, 3-fork, 4-key, 5-ladder, 6-pizza, 7-stop_sign, 8-tennis_racquet, 9-wheel
image_number=random.randint(1,max_length*percen[0])
dire = r'data/train/' + str(class_name[drawing_class]) + '/' + str(class_name[drawing_class]) + '_' + 'train' + '_' + str(image_number) +'.npy'
data = np.load(dire)
plt.imshow(data)
plt.show()

# **3. Network Definition**

In this section we will define mini-batchs, will set the architecture of the network and the forward pass, and will also define the loss function and the optimizer. 

##**3.1 Mini-Batch Definition**

We define a mini-batch of size *bs*. This sample subsets of data is what is going to be forward propagated through the network. We use a mini-batch instead of the whole batch because it would be very expensive to use the complete training set. 

In [0]:
def load_sample(x):
	return np.load(x)

bs = 30 #To perfectly fit in the data
train_dir = r"data/train"
val_dir = r"data/validation"
test_dir = r"data/test"

transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

train_dataset = datasets.DatasetFolder(train_dir, extensions = ['.npy'], loader = load_sample)
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size = bs, shuffle = True, num_workers = 2)
train_iter = iter(train_loader)

valid_dataset = datasets.DatasetFolder(val_dir, extensions = ['.npy'], loader = load_sample)
valid_loader = torch.utils.data.DataLoader(valid_dataset, batch_size = bs, shuffle = True, num_workers = 2)
valid_iter = iter(valid_loader)

test_dataset = datasets.DatasetFolder(test_dir, extensions = ['.npy'], loader = load_sample)
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size = bs, shuffle = True, num_workers = 2)
test_iter = iter(test_loader)

batch, labels = train_iter.next()

print('Done!')

## **3.2 CNN Definition and Forward Pass**
We started creating the basic CNN architectures and testing how they performed but, as it was very shallow it gave very poor results. In fact, most of the times it got stuck very soon in a local minimum, so the results were awful. 

With the purpose of improving the performance of the CNN, we deepened the network, so the probability of finding a *bad* local minimum decreased. We came up with the following structure that resulted to be excellent in terms of performance. 

This final architecture, which will be followingly explained, consists basically on alternating 5 convolutional layers (followed by a non-linearity and a batch normalization layer) with 2 max-pooling layers and, ending with 3 fully connected layers also followed by non-linearity. 

![arquitecturacnn3](https://user-images.githubusercontent.com/43316350/50046302-c963b400-00a1-11e9-90e4-769db06d6ec9.JPG)

The **Convolutional Layers**  transform 3D input volume to a 3D output volume of neuron activations performing convolutions on a 2D grid. For the final architecture we have used 5 convolutional with a kernel size of 3x3 and of stride=1 each. They differ in the number of filters though, passing from 6 filters in the first layers to 16 and ending with 32 filters. These last characteristics (filter spatial extent, stride and number of filters) have been set as hyperparameters, which means that they their value is the one that has proven to give a better performance to the network after trying different ones. 

The **Non-liniarity Layers** that we have used are ReLU (Rectified Linear Unit) Layers, which can be seen as simple range transforms that perform a simple pixel-based mapping that sets the negative values of the image to zero. 

We also introduced **Batch Normalization** layers (normalize the activations of each channel by subbstracting the mean and dividing by the standard deviation), with the objective of simplifying, speeding up the training and reducing the sensitivity to network initialization. 

The network also contains two **Pooling Layers**, which are in charge of the down-sampling of the image and therefore reducing the number of activations, as well as providing invariance to small local changes. Four our architecture we have chosen to get the maximum values of 2x2 pixel rectangular regions around the input locations (that is, Max-Pooling with stride 2,2). It must be noted that we have just used two of this layer because the original size of our input data was already quite small (28x28 pixel images), so if we wanted a deep network, we could not afford adding pooling layers after each convolutional because we would have lost too much information about precise position of things. 

The **Fully-connected Layers** are the classic layers in which every neuron in the previous layer is connected to every neuron in the next layer and activation is computed as matrix multiplication plus bias. Here, the output of the last convolutional layer is flattened to a single vector which is input to a fully connected layer. 





In [0]:
def weight_init(m):
  if isinstance(m,nn.Conv2d):
    torch.nn.init.xavier_uniform_(m.weight.data)        
  
class Net(nn.Module):
  
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 6, 3, padding = 1)
        torch.nn.init.xavier_uniform_(self.conv1.weight)
        
        self.bn1 = nn.BatchNorm2d(6)
        
        self.conv2 = nn.Conv2d(6, 16, 3, padding = 1)
        torch.nn.init.xavier_uniform_(self.conv2.weight)
        
        self.pool = nn.MaxPool2d(2, 2)
        
        self.bn2 = nn.BatchNorm2d(16)
        
        self.conv3 = nn.Conv2d(16, 16, 3, padding = 1)
        torch.nn.init.xavier_uniform_(self.conv3.weight)
        
        self.bn3 = nn.BatchNorm2d(16)
        
        self.conv4 = nn.Conv2d(16, 32, 3, padding = 1)
        torch.nn.init.xavier_uniform_(self.conv4.weight)
        
        self.bn4 = nn.BatchNorm2d(32)
        
        self.conv5 = nn.Conv2d(32, 32, 3, padding = 1)
        torch.nn.init.xavier_uniform_(self.conv5.weight)
        
        self.bn5 = nn.BatchNorm2d(32)
        
        self.fc1 = nn.Linear(32 * 7 * 7, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = F.relu(self.bn1(self.conv1(x)))
        x = F.relu(self.bn2(self.conv2(x)))
        x = self.pool(x)
        x = F.relu(self.bn3(self.conv3(x)))              
        x = F.relu(self.bn4(self.conv4(x)))
        x = self.pool(x)
        x = F.relu(self.bn5(self.conv5(x)))
        x = F.relu((self.conv5(x)))
        x = x.view(-1, 32 * 7 * 7)
        x = self.fc1(x)
        x = F.relu(x)
        x = self.fc2(x)
        x = F.relu(x)
        x = self.fc3(x)
        return x

net = Net()
net.apply(weight_init)
net.to(device)
print(net)

print('Done!')

## **3.3 Loss Function and Optimizer Definition**

As we are working on a classification task, we have chosen to use the Cross Entropy Loss. For the optimizer we will use Adaptive Moments - **ADAM** (having previously seen that it gives better results than the Gradient Descent, where the network got stuck very often). 

As an hyperparameter, we worked with different **learning rates**,

To prevent the overfitting that appeared on our model, we decided to implement **loss regularization**, though we could have used many other techniques such as early stopping, dropout, or data augmentation among others. We decided to add the L2 Regularization (or weight decay) to our cross-entropy loss. The L2 penalizes the complexity of the classifier by measuring the number of zeros in the weight vector. The resulting total loss is the following. 

![loss](https://user-images.githubusercontent.com/43316350/50059425-ee7e2280-0186-11e9-8973-6bcbf4670a88.JPG)

Where *lambda* is the regularization hyperparameter (experimentally decided value).


In [0]:
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(net.parameters(), lr=0.00001,weight_decay=0.001)
print('Done!')

# **4. Network Training**

In this section we will train our model and validate it with the validation data. At the end of the training, we will plot the lossses and the accuracies obtained for each epoch both for the training and the validation data. 

## 4.1 Training and Validation

Now that we have our model properly built. let's train the model. The process that is performed each epoch is the following one:

1. Get a  batch from the DataLoader

2. Forward Pass that batch through the network

3. Get the ouputs of the propagated batch

4. Compute the loss and accuracy with respect these outputs

5. Propagate backwards to compute the parameters update

6. At the end of each epoch, we compute a forward pass and its corresponding loss and accuracy in the entire validation set

In [0]:
# To plot the results
training_loss_list = []
training_accuracy_list = []
validation_loss_list = []
validation_accuracy_list = []

for epoch in range(3):  # loop over the dataset multiple times

    running_loss = 0.0
    training_accuracy = 0.0
    training_total = 0.0
    training_correct = 0.0
    
    for i, data in enumerate(train_loader, 0):
      
        # get the inputs
        inputs, labels = data
        inputs, labels = inputs.to(device), labels.to(device)
        
        # zero the parameter gradients
        optimizer.zero_grad()

        # forward + backward + optimize
        inputs = inputs.view(bs,1,28,28).float()
        inputs = inputs.to(device)
        outputs = net(inputs)
        outputs = outputs.to(device)
        labels = labels.to(device)
        
        _, predicted = torch.max(outputs.data, 1) #gets the index of the maximum predicted value
        training_total = training_total + labels.size(0)
        training_correct = training_correct + (predicted == labels).sum().item() #accumulate correct
        
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        # print statistics
        running_loss += loss.item()
        if i % 200 == 199:    # print every 2000 mini-batches
            training_accuracy=training_correct/training_total
            training_accuracy_list.append(training_accuracy)
            print('[%d, %5d] Training Loss: %.3f - Training Accuracy: %.3f' %
                  (epoch + 1, i + 1, running_loss / 200,training_accuracy))
            training_loss_list.append(running_loss/200)
            running_loss = 0.0
            total=0.0
            correct=0.0
            
    with torch.no_grad():
      
      running_validation_loss=0.0
      validation_accuracy=0.0
      validation_total=0.0
      validation_correct=0.0
      
      for j, valid_data in enumerate(valid_loader,0):     
        valid_inputs, valid_labels = valid_data
        valid_inputs = valid_inputs.view(bs, 1, 28, 28).float()
        valid_inputs = valid_inputs.to(device)
        valid_labels = valid_labels.to(device)
        valid_outputs = net(valid_inputs)
        valid_loss = criterion(valid_outputs, valid_labels)
        running_validation_loss += valid_loss.item()
        
        _,predicted=torch.max(valid_outputs.data,1)
        validation_total=validation_total+valid_labels.size(0)
        validation_correct=validation_correct + (predicted == valid_labels).sum().item()
        
      validation_accuracy=validation_correct/validation_total
      validation_accuracy_list.append(validation_accuracy)
      print('[%d] Validation Loss: %.3f - Validation Accuracy: %.3f' %
          (epoch + 1, running_validation_loss/len(valid_loader), validation_accuracy))
      validation_loss_list.append(running_validation_loss/len(valid_loader))


print('Finished Training')

## 4.2 Results Visualization

We will now visualize how did the training go... Both for the training and validation set we will plot the progress of the loss and the accuracy throughout all the epochs. Remember that we have been using the cross entropy loss with weight decay regularization. For the accuracy, it is computed as proportion between the number of correct outputs and the total number of outputs. 

*As future work, it would be interesting that the network created the losses and accuracy plots in real time.*

In [0]:
training_examples = 6e4
plot_every = 200 #batches
training_loss_np = np.asarray(training_loss_list)
validation_loss_np = np.asarray(validation_loss_list)
training_accuracy_np = np.asarray(training_accuracy_list)
validation_accuracy_np = np.asarray(validation_accuracy_list)

x_axis_train = np.arange(1, len(training_loss_list)+1)
x_axis_validation = np.arange(1, len(validation_loss_list)+1)

p1=plt.plot(x_axis_train * bs * plot_every / training_examples, training_loss_np)
p2=plt.plot(x_axis_validation, validation_loss_np,color="r")

plt.xlabel('epochs')
plt.title('Loss')
plt.legend((p1[0], p2[0]), ('Training Loss', 'Validation Loss'))
plt.show(),

p3=plt.plot(x_axis_train * bs * plot_every / training_examples, training_accuracy_np)
p4=plt.plot(x_axis_validation, validation_accuracy_np,color="r")
plt.xlabel('epochs')
plt.title('Accuracy')
plt.legend((p3[0], p4[0]), ('Training Accuracy', 'Validation Accuracy'))
plt.show()


# **5. Network Testing**
In this section, we will comput the test accuracy and the test loss, we will plot the confusion matrix to see which classess performed better and we will do a little performance demo. 

## 5.1 Test Accuracy and Loss Computation
Let's evaluate the model on the test data. To do so, we will pass to the network mini-batches of test data and compare their results with the ground truth to compute its loss and accuracy.

Additionally, to see how well the network performs on different categories, we have created a plot that shows the accuracy for each class. It can be noted that classes that were very similar (wheel and pizza for example) have lower accuracy than the others, while very different and clear objects such as apple, have a very high accuracy. 

In [0]:
class_correct = list(0. for i in range(10))
class_total = list(0. for i in range(10))
running_test_loss=0.0
test_total=0.0
test_correct=0.0
with torch.no_grad():
    for data in test_loader:
        test_inputs, test_labels = data
        test_inputs = test_inputs.view(bs, 1, 28, 28).float()
        test_inputs = test_inputs.to(device)
        test_labels = test_labels.to(device)
        test_outputs = net(test_inputs)
        test_loss = criterion(test_outputs, test_labels)
        running_test_loss += test_loss.item()
        _, predicted = torch.max(test_outputs.data,1)
        c = (predicted == test_labels).squeeze()
        test_total=test_total+test_labels.size(0)
        test_correct=test_correct + (predicted == test_labels).sum().item()
        for i in range(bs):
            label = test_labels[i]
            class_correct[label] += c[i].item()
            class_total[label] += 1
            
test_accuracy=test_correct/test_total

print('Test Loss: %.3f - Test Accuracy: %.3f' %
         (running_test_loss/len(test_loader), test_accuracy))
print()
x=np.arange(len(class_name))
plt.barh(x, class_correct, align='center', alpha=0.5)
plt.yticks(x, class_name)
plt.xlabel('Accuracy')
plt.title('Accuracy by Class')
 
plt.show()

torch.save(net.state_dict(), 'data/weights_state_at_epoch_100')

## 5.2 Performance Demo

Finally, in this little demo we can see how the network performs for a random image of the last batch of the test set. An interesting experiment to do is to first try to classify the image by ourselfs and then looking to the predicted class and the ground true value to see if the network performed better than a human...

In [0]:
test_inputs, test_labels = data
test_inputs = test_inputs.view(bs, 1, 28, 28).float()
test_inputs = test_inputs.to(device)
test_labels = test_labels.to(device)
test_outputs = net(test_inputs)
_, predicted = torch.max(test_outputs.data,1)
image_number=random.randint(0,bs-1)
a=test_inputs[image_number].cpu().numpy()
plt.imshow(a[0, :, :])
plt.show()

Let's see the network's prediction and the ground truth!

In [0]:
print('PREDICTED: It is a/an: %s!' % class_name[predicted[image_number]])
print('GROUND TRUTH: %s' % class_name[test_labels[image_number]])