# PyTorch - homework 2: neural networks

-- Prof. Dorien Herremans

Please run the whole notebook with your code and submit the `.ipynb` file on eDimension that includes your answers [so after you run it]. 

In [1]:
from termcolor import colored

student_number="1003296"
student_name="Phang Teng Fone"

print(colored("Homework by "  + student_name + ', number: ' + student_number,'red'))

[31mHomework by Phang Teng Fone, number: 1003296[0m


 ## Question 1 -- XOR neural network [3pts]

a) Train an (at least) 2-layer neural network that can solve the XOR problem. Hint: be sure to check both this week and last week's lab. 

b) Check the predictions resulting from your model in the second code box below.


In [2]:
# load your data
import torch
import numpy as np
from torch.autograd import Variable
import torch.nn as nn
import torch.nn.functional as F


# training set of input X and labels Y
X = torch.Tensor([[0,0],[0,1], [1,0], [1,1]])
Y = torch.Tensor([0,1,1,0]).view(-1,1) #view is similar to numpy.reshape() here it makes it into a column

class FeedForwardNN(nn.Module):
  # input_size: Dimensionality of input feature vector.
  # num_classes: The number of classes in the classification problem.
  # num_hidden: The number of hidden (intermediate) layers to use.
  # hidden_dim: The size of each of the hidden layers.
  # dropout: The proportion of units to drop out after each layer.
  def __init__(self, input_size, num_classes, num_hidden, hidden_dim, dropout):
    # Always call the superclass (nn.Module) constructor first!
    super(FeedForwardNN, self).__init__()
    
    # Set up the hidden layers.
    assert num_hidden > 0
    # A special ModuleList to store our hidden layers.
    self.hidden_layers = nn.ModuleList([])
    # First hidden layer maps from input_size -> num_hidden.
    self.hidden_layers.append(nn.Linear(input_size, hidden_dim))
    # Subsequent hidden layers map from num_hidden -> num_hidden.
    # Note that they can map to any dimensionality --- as long as the final
    # output is a distribution over your classes!
    for i in range(num_hidden - 1):
      self.hidden_layers.append(nn.Linear(hidden_dim, hidden_dim))
    
    # Set up the dropout layer.
    self.dropout = nn.Dropout(dropout)
    
    # Set up the final transform to a distribution over classes.
    self.output_projection = nn.Linear(hidden_dim, num_classes)
    
    # Set up the nonlinearity to use between layers.
    self.nonlinearity = nn.ReLU()
    
  # Forward's sole argument is the input.
  # input is of shape (batch_size, input_size)
  def forward(self, x):
    # Apply the hidden layers, nonlinearity, and dropout.
    for hidden_layer in self.hidden_layers:
      x = hidden_layer(x)
      x = self.dropout(x)
      x = self.nonlinearity(x)
      
    # Output layer: project x to a distribution over classes.
    out = self.output_projection(x)
    
    # Softmax the out tensor to get a log-probability distribution
    # over classes for each example.
    out_distribution = F.log_softmax(out, dim=-1)
    return out_distribution

# name your model xor
def xor():
    model = FeedForwardNN(2,2,2,5,0)
    return model

model = xor()

# define your model loss function, optimizer, etc.
loss_function = nn.NLLLoss()
momentum= 0.9
lr_rate = 0.001  # alpha
# SGD: stochastic gradient descent is used to train/fit the model
optimizer = torch.optim.SGD(model.parameters(), lr=lr_rate, momentum=momentum)


# train the model
#training loop:
epochs = 2001 #how many times we go through the training set
steps = X.size(0) #steps = 4; we have 4 training examples (I know, tiny training set :)

for i in range(epochs):
    for j in range(steps):        
        optimizer.zero_grad() # empty (zero) the gradient buffers
        y_hat = model(X[j].unsqueeze(0)) #get the output from the model

        loss = loss_function(y_hat, Y[j].type(torch.LongTensor)) #calculate the loss
        loss.backward() #backprop
        optimizer.step() #does the update

    if i % 500 == 0:
        print ("Epoch: {0}, Loss: {1}, ".format(i, loss.data.numpy()))

Epoch: 0, Loss: 0.5376112461090088, 
Epoch: 500, Loss: 0.36856818199157715, 
Epoch: 1000, Loss: 0.06509296596050262, 
Epoch: 1500, Loss: 0.031519174575805664, 
Epoch: 2000, Loss: 0.020460965111851692, 


In [3]:
# test your model using the following functions (make sure the output is printed and saved when you submit this notebook):
# depending on how you defined your network you may need to slightly tweek the below prediction function

test = [[0,0],[0,1],[1,1],[1,0]]

for trial in test: 
  Xtest = torch.Tensor(trial)
  y_hat = model(Xtest)

  prediction = torch.argmax(y_hat)
  
  print("{0} xor {1} = {2}".format(int(Xtest[0]), int(Xtest[1]), prediction))

0 xor 0 = 0
0 xor 1 = 1
1 xor 1 = 0
1 xor 0 = 1


## Question 2  [2pts]

Imagine a neural network model for a multilabel classification task. 

a) Which loss function should you use?

b) The resulting trained modal has a high variance error. Give 4 possible solutions to improve the model. 


```
* answer A
Binary cross-entropy loss function. Element base decision belonging to a certain class should not influence the decision for another class.
* answer B
  - Regularization
  - Dropout
  - Early Stopping
  - Increase size of training set / Reduce # of model parameters

```


## Question 3 - Improve hit classification [5pts]

Remember the hit predicton dataset from last week? 

a) Improve the model using a multiplayer perceptron. 

b) Make sure to run your models on the GPU. 

c) Tweek the hyperparameters such as number of nodes or layers, or other. Show two possible configurations and explain which works better and very briefly explain why this may be the case. 




In [4]:
# Download dataset
!wget https://dorax.s3.ap-south-1.amazonaws.com/herremans_hit_1030training.csv
!wget https://dorax.s3.ap-south-1.amazonaws.com/herremans_hit_1030test.csv

--2021-07-01 16:29:33--  https://dorax.s3.ap-south-1.amazonaws.com/herremans_hit_1030training.csv
Resolving dorax.s3.ap-south-1.amazonaws.com (dorax.s3.ap-south-1.amazonaws.com)... 52.219.62.3
Connecting to dorax.s3.ap-south-1.amazonaws.com (dorax.s3.ap-south-1.amazonaws.com)|52.219.62.3|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 147372 (144K) [text/csv]
Saving to: ‘herremans_hit_1030training.csv’


2021-07-01 16:29:34 (238 KB/s) - ‘herremans_hit_1030training.csv’ saved [147372/147372]

--2021-07-01 16:29:34--  https://dorax.s3.ap-south-1.amazonaws.com/herremans_hit_1030test.csv
Resolving dorax.s3.ap-south-1.amazonaws.com (dorax.s3.ap-south-1.amazonaws.com)... 52.219.62.46
Connecting to dorax.s3.ap-south-1.amazonaws.com (dorax.s3.ap-south-1.amazonaws.com)|52.219.62.46|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 36712 (36K) [text/csv]
Saving to: ‘herremans_hit_1030test.csv’


2021-07-01 16:29:35 (178 KB/s) - ‘herremans_hit_103

In [5]:
# code your model 1
import torch
import numpy as np
from torch.autograd import Variable
import torch.nn as nn

import pandas as pd
import numpy as np
import torch
import torch.nn.functional as F

# load data
train_pd = pd.read_csv('./herremans_hit_1030training.csv')
labels = train_pd.iloc[:,-1]
features = train_pd.loc[:, 'timesignature':'T12kurtosis']

labels = torch.Tensor(labels.values).reshape(321,1)
features = torch.Tensor(features.values)

# Device configuration
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# define model 1
class Model1(torch.nn.Module):
    def __init__(self, input_dim, output_dim, hidden_size):
        super(Model1, self).__init__()
        self.linear1 = nn.Linear(input_dim,hidden_size)
        self.linear2 = nn.Linear(hidden_size,hidden_size)
        self.linear3 = nn.Linear(hidden_size,output_dim)

    def forward(self, x):
        x = self.linear1(x)
        x = nn.functional.relu(x)
        x = self.linear2(x)
        x = nn.functional.relu(x)
        x = self.linear3(x)
        x = nn.functional.sigmoid(x)
        return x


# train model
input_dim = features.size(1)
output_dim = 1
hidden_size = 20
num_of_data = features.size(0)
epochs = 100
lr = 0.01
loss_func = torch.nn.BCELoss()

model_1 = Model1(input_dim,output_dim,hidden_size).to(device)
optimizer = torch.optim.SGD(model_1.parameters(), lr = lr)


model_1.train()

for i in range(epochs):
    for j in range(num_of_data):
        # randomly sample from the training set:
        data_point = np.random.randint(num_of_data)
        # store the retrieved datapoint into 2 separate variables of the right shape
        x_var = torch.Tensor(features[data_point]).unsqueeze(0).cuda()
        y_var = torch.Tensor(labels[data_point]).unsqueeze(0).cuda()

        # print(x_var.size())
        
        optimizer.zero_grad() # empty (zero) the gradient buffers
        y_hat = model_1(x_var) #get the output from the model

        loss = loss_func(y_hat, y_var) #calculate the loss
        loss.backward() #backprop
        optimizer.step() #does the update

    if i % 10 == 0:
        print(f"Epoch - {i} , Loss - {loss.item()}")




Epoch - 0 , Loss - 0.48464882373809814
Epoch - 10 , Loss - 0.6198206543922424
Epoch - 20 , Loss - 0.00617109565064311
Epoch - 30 , Loss - 7.152560215217818e-07
Epoch - 40 , Loss - 2.3841860752327193e-07
Epoch - 50 , Loss - 0.002638539532199502
Epoch - 60 , Loss - 1.1920930376163597e-07
Epoch - 70 , Loss - 0.002432978944852948
Epoch - 80 , Loss - 0.0
Epoch - 90 , Loss - 0.0


In [6]:
# evaluate model 1 (called model1 here)
def run_evaluation(my_model):

  test = pd.read_csv('/content/herremans_hit_1030test.csv')
  labels = test.iloc[:,-1]
  test = test.drop('Topclass1030', axis=1)
  testdata = torch.Tensor(test.values)
  testlabels = torch.Tensor(labels.values).view(-1,1)

  TP = 0
  TN = 0
  FN = 0
  FP = 0

  for i in range(0, testdata.size()[0]): 
    # print(testdata[i].size())
    Xtest = torch.Tensor(testdata[i]).cuda()
    y_hat = my_model(Xtest)
    
    if y_hat > 0.5:
      prediction = 1
    else: 
      prediction = 0

    if (prediction == testlabels[i]):
      if (prediction == 1):
        TP += 1
      else: 
        TN += 1

    else:
      if (prediction == 1):
        FP += 1
      else: 
        FN += 1

  print("True Positives: {0}, True Negatives: {1}".format(TP, TN))
  print("False Positives: {0}, False Negatives: {1}".format(FP, FN))
  rate = TP/(FN+TP)
  print("Class specific accuracy of correctly predicting a hit song is {0}".format(rate))

run_evaluation(model_1)

True Positives: 38, True Negatives: 18
False Positives: 11, False Negatives: 12
Class specific accuracy of correctly predicting a hit song is 0.76




In [9]:
# code your model 2
# define model 2 
class Model2(torch.nn.Module):
    def __init__(self, input_dim, output_dim, hidden_size):
        super(Model2, self).__init__()
        self.linear1 = nn.Linear(input_dim,hidden_size)
        self.linear2 = nn.Linear(hidden_size,output_dim)

    def forward(self, x):
        x = self.linear1(x)
        x = nn.functional.dropout(x,p=0.5)
        x = nn.functional.relu(x)
        x = self.linear2(x)
        x = nn.functional.sigmoid(x)
        return x

# train model
input_dim = features.size(1)
output_dim = 1
hidden_size = 100
num_of_data = features.size(0)
epochs = 100
lr = 0.01
loss_func = torch.nn.BCELoss()

model_2 = Model2(input_dim,output_dim,hidden_size).to(device)
optimizer = torch.optim.SGD(model_1.parameters(), lr = lr)


model_2.train()

for i in range(epochs):
    for j in range(num_of_data):
        # randomly sample from the training set:
        data_point = np.random.randint(num_of_data)
        # store the retrieved datapoint into 2 separate variables of the right shape
        x_var = torch.Tensor(features[data_point]).unsqueeze(0).cuda()
        y_var = torch.Tensor(labels[data_point]).unsqueeze(0).cuda()

        # print(x_var.size())
        
        optimizer.zero_grad() # empty (zero) the gradient buffers
        y_hat = model_1(x_var) #get the output from the model

        loss = loss_func(y_hat, y_var) #calculate the loss
        loss.backward() #backprop
        optimizer.step() #does the update

    if i % 10 == 0:
        print(f"Epoch - {i} , Loss - {loss.item()}")



Epoch - 0 , Loss - 1.7285496141994372e-05
Epoch - 10 , Loss - 0.0009030603687278926
Epoch - 20 , Loss - 4.3333515350241214e-05
Epoch - 30 , Loss - 0.00013870962720829993
Epoch - 40 , Loss - 0.0002667663502506912
Epoch - 50 , Loss - 3.4570753086882178e-06
Epoch - 60 , Loss - 0.0002977695257868618
Epoch - 70 , Loss - 0.0006927266367711127
Epoch - 80 , Loss - 8.702316335984506e-06
Epoch - 90 , Loss - 0.0002803599345497787


In [10]:
# evaluate model 2 (called model2 here)

run_evaluation(model_2)

True Positives: 42, True Negatives: 6
False Positives: 23, False Negatives: 8
Class specific accuracy of correctly predicting a hit song is 0.84




Which works better and why do you think this may be (very briefly)? 


For both Model 1 and 2, it uses the same learning rate, epochs and loss functions. The only differences are

Model 1:
- Uses 3 hidden layers of size 20

Model 2:
- Uses 2 hidden layer of size 100
- An additional drop out layer to prevent overfitting

Model 2 works better as adding drop out layer prevents overfitting, additional features of hidden layers reduce training loss and reducing hidden layers might improve variance.

Additionally, submit your results [here](https://forms.gle/NtJJEE7Wm5ZRM3Je7) for 'Class specific accuracy of correctly predicting a hit song' and see if you got the best performance of the class! Good luck!