# Homework 5: Neural Networks in PyTorch

-- Prof. Dorien Herremans

Please run the whole notebook with your code and submit the `.ipynb` file on eDimension that includes your answers [so after you run it]. 

 ## Question 1 -- XOR neural network [3pts]

a) Train an (at least) 2-layer neural network that can solve the XOR problem. 

b) Check the predictions resulting from your model in the second code box below.


In [1]:
# load your data
import torch
import torch.optim as optim
import torch.nn as nn
import torch.nn.functional as F
import numpy as np

train_x = torch.Tensor([[0,0], [1,1], [0,1], [1,0]])
train_y = torch.LongTensor([[0], [0], [1], [1]])

class FeedForwardNN(nn.Module):
  # input_size: Dimensionality of input feature vector.
  # num_classes: The number of classes in the classification problem.
  # num_hidden: The number of hidden (intermediate) layers to use.
  # hidden_dim: The size of each of the hidden layers.
  # dropout: The proportion of units to drop out after each layer.
  def __init__(self, input_size, num_classes, num_hidden, hidden_dim, dropout):
    # Always call the superclass (nn.Module) constructor first!
    super(FeedForwardNN, self).__init__()
    
    # Set up the hidden layers.
    assert num_hidden > 0
    # A special ModuleList to store our hidden layers.
    self.hidden_layers = nn.ModuleList([])
    # First hidden layer maps from input_size -> num_hidden.
    self.hidden_layers.append(nn.Linear(input_size, hidden_dim))
    # Subsequent hidden layers map from num_hidden -> num_hidden.
    # Note that they can map to any dimensionality --- as long as the final
    # output is a distribution over your classes!
    for i in range(num_hidden - 1):
      self.hidden_layers.append(nn.Linear(hidden_dim, hidden_dim))
    
    # Set up the dropout layer.
    self.dropout = nn.Dropout(dropout)
    
    # Set up the final transform to a distribution over classes.
    self.output_projection = nn.Linear(hidden_dim, num_classes)
    
    # Set up the nonlinearity to use between layers.
    self.nonlinearity = nn.ReLU()
    
  # Forward's sole argument is the input.
  # input is of shape (batch_size, input_size)
  def forward(self, x):
    # Apply the hidden layers, nonlinearity, and dropout.
    for hidden_layer in self.hidden_layers:
      x = hidden_layer(x)
      x = self.dropout(x)
      x = self.nonlinearity(x)
      
    # Output layer: project x to a distribution over classes.
    out = self.output_projection(x)    

    # Softmax the out tensor to get a log-probability distribution
    # over classes for each example.
    out_distribution = F.log_softmax(out, dim=-1)

    return out_distribution

# name your model xor
def xor_model():
    input_size = 2
    num_classes = 2
    num_hidden = 2
    hidden_dim = 10
    dropout = 0.0
    return FeedForwardNN(input_size,
                         num_classes,
                         num_hidden,
                         hidden_dim,
                         dropout)
    
xor = xor_model()    

# define your model loss function, optimizer, etc. 
criterion = nn.NLLLoss()
lr = 0.005
momentum = 0.9
optimizer = optim.SGD(xor.parameters(),
                      lr=lr,
                      momentum=momentum)
# train the model
epoch = 200
steps = train_x.size(0)

def train(model, optimizer, criterion, x = train_x, y = train_y):
    model.train()
    for i in range(epoch):
        for j in range(steps):
            optimizer.zero_grad()             
            inp = x[j].unsqueeze(0)
            label = y[j].type(torch.LongTensor)      
            predicted = model(inp)   
            loss = criterion(predicted, label)     
            loss.backward()
            optimizer.step()

        if i % 20 == 0:
            print("Epoch num: {}, Loss: {}".format(i, loss))

train(xor, optimizer, criterion)

Epoch num: 0, Loss: 0.8492563366889954
Epoch num: 20, Loss: 0.6953948736190796
Epoch num: 40, Loss: 0.6758471727371216
Epoch num: 60, Loss: 0.6439141631126404
Epoch num: 80, Loss: 0.5481469035148621
Epoch num: 100, Loss: 0.3501562178134918
Epoch num: 120, Loss: 0.1337297111749649
Epoch num: 140, Loss: 0.05148107931017876
Epoch num: 160, Loss: 0.02359386533498764
Epoch num: 180, Loss: 0.014554874040186405


In [2]:
# test your model using the following functions (make sure the output is printed and saved when you submit this notebook):
# depending on how you defined your network you may need to slightly tweek the below prediction function

test = [[0,0],[0,1],[1,1],[1,0]]

for trial in test: 
  Xtest = torch.Tensor(trial)
  y_hat = xor(Xtest)  
  prediction = torch.argmax(y_hat, axis=0)
  print("{0} xor {1} = {2}".format(int(Xtest[0]), int(Xtest[1]), prediction))  

0 xor 0 = 0
0 xor 1 = 1
1 xor 1 = 0
1 xor 0 = 1


## Question 2  [2pts]

Imagine a neural network model for a multilabel classification task. 

a) Which loss function should you use?

b) The resulting trained modal has a high variance error. Give 4 possible solutions to improve the model. 


```
[your answer here, no coding required]

* answer A
  - For multilabel classification, you use binary cross entropy loss for every class. For each class, we would have an output node to check if that class label should be assigned to that instance.
* answer B
  - Add data to the training set
  - Add regularisation
  - Add early stopping
  - Feature selection to decrease number of features

```


## Question 3 - Improve hit classification [5pts]

Remember the hit predicton dataset from last week? 

a) Improve the model using a multiplayer perceptron. 

b) Make sure to run your models on the GPU. 

c) Tweek the hyperparameters such as number of nodes or layers, or other. Show two possible configurations and explain which works better and very briefly explain why this may be the case. 




In [3]:
# code your model 1
import pandas as pd
import torch.nn as nn
import torch.nn.functional as F
import numpy as np 
from torch.utils.data import Dataset, DataLoader

# define MLP model
class MultiLayerPerceptron(nn.Module):
  # input_size: Dimensionality of input feature vector.
  # num_classes: The number of classes in the classification problem.
  # num_hidden: The number of hidden (intermediate) layers to use.
  # hidden_dim: The size of each of the hidden layers.
  # dropout: The proportion of units to drop out after each layer.
  def __init__(self, input_size, num_classes, num_hidden, hidden_dim, dropout):
    # Always call the superclass (nn.Module) constructor first!
    super(MultiLayerPerceptron, self).__init__()
    
    # Set up the hidden layers.
    assert num_hidden > 0
    # A special ModuleList to store our hidden layers.
    self.hidden_layers = nn.ModuleList([])
    # First hidden layer maps from input_size -> num_hidden.
    self.hidden_layers.append(nn.Linear(input_size, hidden_dim))
    # Subsequent hidden layers map from num_hidden -> num_hidden.
    # Note that they can map to any dimensionality --- as long as the final
    # output is a distribution over your classes!
    for i in range(num_hidden - 1):
      self.hidden_layers.append(nn.Linear(hidden_dim, hidden_dim))
    
    # Set up the dropout layer.
    self.dropout = nn.Dropout(dropout)
    
    # Set up the final transform to a distribution over classes.
    self.output_projection = nn.Linear(hidden_dim, num_classes)
    
    # Set up the nonlinearity to use between layers.
    self.nonlinearity = nn.ReLU()
    
  # Forward's sole argument is the input.
  # input is of shape (batch_size, input_size)
  def forward(self, x):
    # Apply the hidden layers, nonlinearity, and dropout.
    for hidden_layer in self.hidden_layers:
      x = hidden_layer(x)
      x = self.dropout(x)
      x = self.nonlinearity(x)
      
    # Output layer: project x to a distribution over classes.
    out = self.output_projection(x)    

    # Softmax the out tensor to get a log-probability distribution
    # over classes for each example.
    out = torch.sigmoid(out)

    return out

class HitPredictionDataset(Dataset):
    def __init__(self, inputs, targets):
        self.inputs = inputs 
        self.targets = targets

    def __len__(self):
        return len(self.inputs)

    def __getitem__(self, idx):
        return (self.inputs[idx].astype(np.float32), self.targets[idx].astype(np.float32))

# load data
csv = pd.read_csv("https://dorax.s3.ap-south-1.amazonaws.com/herremans_hit_1030training.csv")
inputs = csv.drop('Topclass1030', 1).to_numpy()
targets = csv['Topclass1030'].to_numpy()

batch_size = 4
train_dataset = HitPredictionDataset(inputs, targets)
train_dataloader = DataLoader(train_dataset,
                              batch_size=batch_size,
                              shuffle=True)

# train model 1
input_size = csv.shape[1] - 1
num_classes = 1
num_hidden = 2
hidden_dim = 5
dropout = 0.0

model1 = MultiLayerPerceptron(input_size,
                              num_classes,
                              num_hidden,
                              hidden_dim,
                              dropout)

criterion = nn.BCELoss() 
lr = 0.002
momentum = 0.9
optimizer = optim.SGD(model1.parameters(),
                      lr=lr,
                      momentum=momentum)
num_epochs = 500

def train(model, num_epochs, optimizer, criterion):
    model.train()
    for epoch in range(num_epochs): 
        total_batch_loss = 0
        for (inputs, targets) in train_dataloader:
            predicted = model(inputs).squeeze(1)   
            batch_loss = criterion(predicted, targets)
            total_batch_loss += batch_loss
            optimizer.zero_grad()
            batch_loss.backward()
            optimizer.step()

        if epoch % 50 == 0:
            print("Epoch num: {}, Epoch loss: {}".format(epoch, total_batch_loss))

train(model1, num_epochs, optimizer, criterion)

Epoch num: 0, Epoch loss: 53.76749038696289
Epoch num: 50, Epoch loss: 29.815656661987305
Epoch num: 100, Epoch loss: 14.081474304199219
Epoch num: 150, Epoch loss: 6.413394927978516
Epoch num: 200, Epoch loss: 4.9389519691467285
Epoch num: 250, Epoch loss: 4.39068603515625
Epoch num: 300, Epoch loss: 4.175754547119141
Epoch num: 350, Epoch loss: 4.066369533538818
Epoch num: 400, Epoch loss: 3.9939205646514893
Epoch num: 450, Epoch loss: 3.9588334560394287


In [4]:
# evaluate model 1 (called model1 here)
import pandas as pd 

def run_evaluation(my_model):

#   test = pd.read_csv('/content/herremans_hit_1030test.csv')
  test = pd.read_csv('https://dorax.s3.ap-south-1.amazonaws.com/herremans_hit_1030test.csv')
  labels = test.iloc[:,-1]
  test = test.drop('Topclass1030', axis=1)
  testdata = torch.Tensor(test.values)
  testlabels = torch.Tensor(labels.values).view(-1,1)

  TP = 0
  TN = 0
  FN = 0
  FP = 0

  for i in range(0, testdata.size()[0]): 
    # print(testdata[i].size())
    Xtest = torch.Tensor(testdata[i])
    y_hat = my_model(Xtest)
    
    if y_hat > 0.5:
      prediction = 1
    else: 
      prediction = 0

    if (prediction == testlabels[i]):
      if (prediction == 1):
        TP += 1
      else: 
        TN += 1

    else:
      if (prediction == 1):
        FP += 1
      else: 
        FN += 1

  print("True Positives: {0}, True Negatives: {1}".format(TP, TN))
  print("False Positives: {0}, False Negatives: {1}".format(FP, FN))
  rate = TP/(FN+TP)
  print("Class specific accuracy of correctly predicting a hit song is {0}".format(rate))

run_evaluation(model1)

True Positives: 37, True Negatives: 17
False Positives: 12, False Negatives: 13
Class specific accuracy of correctly predicting a hit song is 0.74


In [5]:
# code your model 2
input_size2 = csv.shape[1] - 1
num_classes2 = 1
num_hidden2 = 2
hidden_dim2 = 20
dropout2 = 0.0

model2 = MultiLayerPerceptron(input_size2,
                              num_classes2,
                              num_hidden2,
                              hidden_dim2,
                              dropout2)

optimizer2 = optim.SGD(model2.parameters(),
                      lr=lr,
                      momentum=momentum)

train(model2, num_epochs, optimizer2, criterion)

Epoch num: 0, Epoch loss: 54.760826110839844
Epoch num: 50, Epoch loss: 2.1095807552337646
Epoch num: 100, Epoch loss: 0.2000361829996109
Epoch num: 150, Epoch loss: 0.09303463250398636
Epoch num: 200, Epoch loss: 0.057620372623205185
Epoch num: 250, Epoch loss: 0.04126419499516487
Epoch num: 300, Epoch loss: 0.03147227689623833
Epoch num: 350, Epoch loss: 0.025633778423070908
Epoch num: 400, Epoch loss: 0.02108062244951725
Epoch num: 450, Epoch loss: 0.017952799797058105


In [6]:
# evaluate model 2 (called model2 here)
run_evaluation(model2)

True Positives: 37, True Negatives: 17
False Positives: 12, False Negatives: 13
Class specific accuracy of correctly predicting a hit song is 0.74


Which works better and why do you think this may be (very briefly)? 


The model 2 works better than model 1. The main difference between model2 compared to model1 is that for each hidden layers, model2 has 50 hidden dimensions while model1 has only 20 hidden dimensions. Since model2 is a larger model, it reduced avoidable bias caused by underfitting which reduced training loss and improved the performance of the model.