<a href="https://colab.research.google.com/github/telecombcn-dl/2018-dlai-team10/blob/master/Multilayer_Perceptron.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Multilayer Perceptron
The problem we are trying to solve here is to classify grayscale images of handwritten objects (28 pixels by 28 pixels), into 10 categories (apple, banana, fork...). The dataset we will use is extracted from the Kaggle competition: **Quick Draw! Doodle Recognition Challenge**.

In this notebook, we will approach this task by implementing a Multilayer Perceptron. For this project we have also implemented other two approaches (Convolutional Neural Network and Long-Short Term Memory Network), that also have a corresponding self-contained notebooks.

*For more details about out project please visit: https://telecombcn-dl.github.io/2018-dlai-team10/*

# 1. Notebook Setting

In this section the Pytorch and relevant Python libraries (Numpy, Matplotlib...) have been imported. Additionally, the notebook environment is set to train on the GPU for a faster results.

In [0]:
from os.path import exists
from wheel.pep425tags import get_abbr_impl, get_impl_ver, get_abi_tag
platform = '{}{}-{}'.format(get_abbr_impl(), get_impl_ver(), get_abi_tag())
cuda_output = !ldconfig -p|grep cudart.so|sed -e 's/.*\.\([0-9]*\)\.\([0-9]*\)$/cu\1\2/'
accelerator = cuda_output[0] if exists('/dev/nvidia0') else 'cpu'

!pip install -q http://download.pytorch.org/whl/{accelerator}/torch-0.4.1-{platform}-linux_x86_64.whl torchvision


import os
import sys
import time
import torch
import requests
import numpy as np
from tqdm import tqdm
import torch.nn as nn
import torch.optim as optim
from google.colab import files
import matplotlib.pyplot as plt
import torch.nn.functional as F
import torchvision.datasets as datasets


#Check if GPU is available
CUDA = torch.cuda.is_available()

# Constants
IMAGE_WIDTH = 28
IMAGE_HEIGHT = 28
N_CLASSES = 10
BATCH_SIZE = 30
LEARNING_RATE = 0.00001
WEIGHT_DECAY = 0.001
EPOCHS = 50
TRAINING_EX = 6e4
PLOT_EVERY = 200

TRAIN_FOLDER = r"data/train"
VALID_FOLDER = r"data/validation"
TEST_FOLDER = r"data/test"
RESULT_FOLDER = r"results"
MODEL_FOLDER = r"saved_model"

# 2. Dataset Preparation
In this section we will download a part of the original dataset, we will reduce the number of samples, distribute them in training, validation and test, reshape them into images and organize them in a structured way.



# 2.1 Dataset Download
The dataset is downloaded from the Google APIs and it comes in the form of a set of Numpy arrays. The Quick! Draw challenge dataset actually contains more than 300 classes, however we will only use 10 of them for our project, for a simplification purpose. We have manually selected the classes we will work with in order to have some interesting inter-class variability (wheeel and pizza are very similar while apple is very different...).

You can access to the dataset clicking in this url: https://console.cloud.google.com/storage/browser/quickdraw_dataset/full/numpy_bitmap

In [0]:
urls = [
      'https://storage.googleapis.com/quickdraw_dataset/full/numpy_bitmap/key.npy',
      'https://storage.googleapis.com/quickdraw_dataset/full/numpy_bitmap/banana.npy',
      'https://storage.googleapis.com/quickdraw_dataset/full/numpy_bitmap/ladder.npy',
      'https://storage.googleapis.com/quickdraw_dataset/full/numpy_bitmap/tennis%20racquet.npy',
      'https://storage.googleapis.com/quickdraw_dataset/full/numpy_bitmap/pizza.npy',
      'https://storage.googleapis.com/quickdraw_dataset/full/numpy_bitmap/stop%20sign.npy',
      'https://storage.googleapis.com/quickdraw_dataset/full/numpy_bitmap/wheel.npy',
      'https://storage.googleapis.com/quickdraw_dataset/full/numpy_bitmap/fork.npy',
      'https://storage.googleapis.com/quickdraw_dataset/full/numpy_bitmap/book.npy',
      'https://storage.googleapis.com/quickdraw_dataset/full/numpy_bitmap/apple.npy',
  ]

class_name = ['apple', 'banana', 'book', 'fork', 'key', 'ladder', 'pizza', 'stop_sign', 'tennis_racquet', 'wheel']

def createDir(path):
  if not os.path.exists(path):
      os.makedirs(path)

def gen_bar_updater(pbar):
  def bar_update(count, block_size, total_size):
      if pbar.total is None and total_size:
          pbar.total = total_size
      progress_bytes = count * block_size
      pbar.update(progress_bytes - pbar.n)
  return bar_update   


def download_url(url, root, filename):
    from six.moves import urllib
    root = os.path.expanduser(root)
    fpath = os.path.join(root, filename + ".npy")

    createDir(root)
    
    #Create model folder 
    createDir(MODEL_FOLDER)

    # Download files
    if !os.path.isfile(fpath):
        try:
            print('Downloading ' + url + ' to ' + fpath)
            urllib.request.urlretrieve(
                url, fpath,
                reporthook = gen_bar_updater(tqdm(unit='B', unit_scale=True))
            )
        except OSError:
            if url[:5] == 'https':
                url = url.replace('https:', 'http:')
                print('Failed download. Trying https -> http instead.'
                      ' Downloading ' + url + ' to ' + fpath)
                urllib.request.urlretrieve(
                    url, fpath,
                    reporthook = gen_bar_updater(tqdm(unit='B', unit_scale=True))
                )

                
for i in range(0, len(urls)):
  download_url(urls[i], "data", class_name[i])

print("The dataset is successfully  download")

# 2.2 Dataset Reduction, Reshaping and Reorganization
As we are implementing a MLP (we are willing to exploit the local connectivity of the data), we want to have the data as images. Furthermore, we have decided to work with a reduced dataset, so the number of samples per class will be max_length. We also split the data into training, validation and test by the percentages defined by percen and place each sample in its corresponding folder.

In [0]:
class_name = ['apple', 'banana', 'book', 'fork', 'key', 'ladder', 'pizza', 'stop_sign', 'tennis_racquet', 'wheel']
step = ['train', 'validation', 'test']
dire = r'data/'

createDir(RESULT_FOLDER)

max_length = 10000         # Maximum number of files (drawings) per class
percen = [0.6, 0.3, 0.1]   # Percentage of training, validation and testing

begin = [0, int(max_length * percen[0]), int(max_length * (percen[0] + percen[1]))]
end = [int(max_length * (percen[0])), int(max_length * (percen[0] + percen[1])), max_length-10]

for c in range(0, len(class_name)):
  print('Class ' + str(c+1) + ' out of ' + str(len(class_name)))
  filename = dire + str(class_name[c]) + '.npy'
  data = np.load(filename)

  for s in range(0, len(step)):
    dire_step = str(dire) + str(step[s])
    if not os.path.exists(dire_step):
      os.makedirs(dire_step)

    for i in range(begin[s], end[s]):
      dire_class = str(dire_step) + '/' + str(class_name[c])
      if not os.path.exists(dire_class):
        os.makedirs(dire_class)

      # Reshape the raw data into 28x28 images
      data_sample = data[i,:].reshape((28, 28))
      sample_name = class_name[c] + '_' + str(step[s]) + '_' + str(i)
      np.save(os.path.join(dire_class, sample_name), data_sample)
        
        
print("The reduction & reshape is complete")

# 3. Network Definition
In this section we will define mini-batchs, will set the architecture of the network and the forward pass, and will also define the loss function and the optimizer.

# 3.1 Mini-Batch Definition
We define a mini-batch of size bs. This sample subsets of data is what is going to be forward propagated through the network. We use a mini-batch instead of the whole batch because it would be very expensive to use the complete training set.

In [0]:
def load_sample(x):
	return np.load(x)


train_dataset = datasets.DatasetFolder(TRAIN_FOLDER, extensions = ['.npy'], loader = load_sample)
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size = BATCH_SIZE, shuffle = True, num_workers = 0)

test_dataset = datasets.DatasetFolder(TEST_FOLDER, extensions = ['.npy'], loader = load_sample)
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size = BATCH_SIZE, shuffle = True, num_workers = 0)

val_dataset = datasets.DatasetFolder(VALID_FOLDER, extensions = ['.npy'], loader = load_sample)
val_loader = torch.utils.data.DataLoader(val_dataset, batch_size = BATCH_SIZE, shuffle = True, num_workers = 0)

# 3.2 MLP Definition and Forward Pass
Followingly the multilayer perceptron network architecture is defined and the forward pass is implemented. Other architectures had been tried but this one has resulted to be the best one in terms of performance.

In [0]:
class MLP(nn.Module):
    def __init__(self):
        super(MLP, self).__init__()
        self.fc1 = nn.Linear(IMAGE_WIDTH * IMAGE_HEIGHT, 500)
        self.fc2 = nn.Linear(500, 500)
        self.fc3 = nn.Linear(500, 500)
        self.fc4 = nn.Linear(500, 500)
        self.fc5 = nn.Linear(500, 256)
        self.fc6 = nn.Linear(256, N_CLASSES)
    def forward(self, x):
        x = x.view(-1, IMAGE_WIDTH * IMAGE_HEIGHT)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = F.relu(self.fc3(x))
        x = F.relu(self.fc4(x))
        x = F.relu(self.fc5(x))
        x =self.fc6(x)
        return x

net = MLP()
print("The multilayer perceptron network model:")
print(net)

# 3.3 Loss Function and Optimizer Definition
As we are working on a classification task, we have chosen to use the Cross Entropy Loss. For the optimizer we will use ADAM, because it is been observed that it gives better results than the Gradient Descent).

In [0]:
criterion = nn.CrossEntropyLoss()

#The weight_decay is use for regulation
optimizer = optim.Adam(net.parameters(), lr = LEARNING_RATE, WEIGHT_DECAY = 0.001)

#If cuda exist, the network will use the GPU
if CUDA:
    net.cuda()

# 4. Network Training
In this section we will train our model and validate it with the validation data. At the end of the training, we will plot the lossses and the accuracies obtained for each epoch both for the training and the validation data.

# 4.1 Training and Validation

In [0]:
training_loss = []
training_accuracy = []
val_loss = []
val_accuracy = []

def train(epoch, net, train_loader, opt):
    # set model to train mode
    net.train()
    correct = 0.0
    running_training_loss = 0.0
    total = 0.0
    for j, item in enumerate(train_loader, 0):
      inputs, labels = item

      inputs = inputs.view(BATCH_SIZE, 1, IMAGE_WIDTH, IMAGE_HEIGHT).float()
      if CUDA:
          inputs = inputs.cuda()
          labels = labels.cuda()

      # Reset gradients
      opt.zero_grad()

      # Forward pass
      outputs = net(inputs)

      pred = outputs.data.max(1)[1]   # get the index of the max log-probability
      correct += (pred == labels).sum().item()
      total += labels.size()[0]
      loss = criterion(outputs, labels)
      loss.backward()                 # calculate the gradients (backpropagation)
      optimizer.step()                # update the weights
      running_training_loss += loss.item()
      if j % 200 == 199:
        accuracy = correct / (BATCH_SIZE*200)
        txt = '[%d, %5d] loss: %.3f - training accuracy: %.3f' % (epoch, j + 1, running_training_loss/200, accuracy)
        training_loss.append(running_training_loss/200)
        training_accuracy.append(accuracy)
        running_training_loss = 0.0
        correct = 0.0
        total = 0.0
        print(txt)
       
      
def validate(net, val_loader, epoch):
    # set model to validation mode
    net.eval()
    val_correct = 0.0
    running_val_loss = 0.0
    val_total = 0.0
    for inputs, labels in val_loader:

        inputs = inputs.view(BATCH_SIZE, 1, IMAGE_WIDTH, IMAGE_HEIGHT).float()
        if CUDA:
            inputs = inputs.cuda()
            labels = labels.cuda()

        outputs = net(inputs)
        val_loss = criterion(outputs, labels)
        val_pred = outputs.data.max(1)[1]
        val_correct += (val_pred == labels).sum().item()
        val_total += labels.size()[0]
        running_val_loss += val_loss.item()

    running_val_loss /= len(val_loader)
    val_accuracy = val_correct / val_total
    txt = '[ %d ] loss: %.3f - validation accuracy: %.3f' % (epoch, running_val_loss, val_accuracy)
    val_loss.append(running_val_loss)
    val_accuracy.append(val_accuracy)
    print("[]")
    print(txt)
              
    return running_val_loss, val_accuracy


for ep in range(EPOCHS):  # epochs loop

    # train
    loss_info = train(ep, net, train_loader, optimizer)
    
    # validate
    val_loss, accuracy = validate(net, val_loader, ep)
    
    # save model weights
    torch.save(net.state_dict(), MODEL_FOLDER + "/model" + str(ep))  

# 4.2 Results Visualization

In [0]:
np_training_loss = np.asarray(training_loss)
np_validation_loss = np.asarray(val_loss)

x_axis = np.asarray(range(0, len(np_training_loss)))
x_axis_val = np.arange(0, len(np_validation_loss))
plt.title("Loss")
plt.plot(x_axis * PLOT_EVERY * BATCH_SIZE / TRAINING_EX, np_training_loss)
plt.plot(x_axis_val, np_validation_loss)
plt.show()
plt.title("Accuracy")
plt.plot(x_axis * PLOT_EVERY * BATCH_SIZE / TRAINING_EX, training_accuracy)
plt.plot(x_axis_val, val_accuracy)
plt.show()

# 5. Network Testing
In this section, we will comput the test accuracy and the test loss, we will plot the confusion matrix to see which classess performed better and we will do a little performance demo.

# 5.1 Test Accuracy and Loss Computation
Let's evaluate the model on the test data. To do so, we will pass to the network mini-batches of test data and compare their results with the ground truth to compute its loss and accuracy.

Additionally, to see how well the network performs on different categories, we have created a plot that shows the accuracy for each class. It can be noted that classes that were very similar (wheel and pizza for example) have lower accuracy than the others, while very different and clear objects such as apple, have a very high accuracy.

In [0]:
running_test_loss = 0.0
test_total = 0.0
test_correct = 0.0
  
for i, test_data in enumerate(test_loader,0): 
  test_inputs, test_labels = test_data
  test_inputs = test_inputs.view(BATCH_SIZE, 1, IMAGE_WIDTH, IMAGE_HEIGHT).float()
  test_inputs = test_inputs.cuda()
  test_labels = test_labels.cuda()
  test_outputs = net(test_inputs)
  test_loss = criterion(test_outputs, test_labels)
  running_test_loss += test_loss.item()
  
  _,predicted = torch.max(test_outputs.data,1)
  test_total = test_total + test_labels.size(0)
  test_correct = test_correct + (predicted == test_labels).sum().item()        
  
test_accuracy = test_correct / test_total
  
print('Test Loss: %.3f - Test Accuracy: %.3f' % (running_test_loss/len(test_loader), test_accuracy))

# 5.2 Performance Demo
Finally, in this little demo we can see how the network performs for a random image of the test set. An interesting experiment to do is to first try to classify the image by ourselfs and then looking to the predicted class and the ground true value to see if the network performed better than a human...

In [0]:
class_correct = list(0. for i in range(10))
class_total = list(0. for i in range(10))
with torch.no_grad():
    for data in test_loader:
        test_inputs, test_labels = data
        test_inputs = test_inputs.view(BATCH_SIZE, 1, 28, 28).float()
        test_inputs = test_inputs.cuda()
        test_labels = test_labels.cuda()
        test_outputs = net(test_inputs)
        _, predicted = torch.max(test_outputs.data,1)
        c = (predicted == test_labels).squeeze()
        for i in range(BATCH_SIZE):
            label = test_labels[i]
            class_correct[label] += c[i].item()
            class_total[label] += 1

x=np.arange(len(class_name))
plt.barh(x, class_correct, align='center', alpha=0.5)
plt.yticks(x, class_name)
plt.xlabel('Accuracy')
plt.title('Accuracy by Class')
 
plt.show()