__Name/Section:__

A: Aaron Palpallatoc / S11

# Neural Networks Exercise

In this notebook, we will extend our `SGDClassifier` into a 2-layer neural network to train on a multinomial dataset.

**Note: there is a jump from how scikit and PyTorch train their models.** It might be helpful to think that PyTorch sets the design of the computation graph first, then starts a session to "feed" the data for training.


## Instructions
* Read each cell and implement the TODOs sequentially. The markdown/text cells also contain instructions which you need to follow to get the whole notebook working.
* Do not change the variable names unless the instructor allows you to.
* Answer all the markdown/text cells with 'A: ' on them. The answer must strictly consume one line only.
* You are expected to search how to some functions work on the Internet or via the docs. 
* There are commented markdown cells that have crumbs. Do not delete them or separate them from the cell originally directly below it. 
* You may add new cells for "scrap work" as long as the crumbs are not separated from the cell below it.
* The notebooks will undergo a 'Restart and Run All' command, so make sure that your code is working properly.
* You are expected to understand the data set loading and processing separately from this class.
* You may not reproduce this notebook or share them to anyone.

## Import
Import **matplotlib**, **csv**, **numpy**, and **torch**.

In [None]:
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
import torch.optim as optim
import torch.nn as nn
import numpy as np
import torch
import csv
import math

%matplotlib inline

# set default size of plots
plt.rcParams['figure.figsize'] = (8.0, 8.0)
plt.rcParams['image.interpolation'] = 'nearest'

torch.manual_seed(0)


%load_ext autoreload
%autoreload 2

# Synthetic Dataset

Let's use the `make_blobs()` function to create a dataset with 1650 instances centered in (-6, 0), (0, 0), and (-3, -5).

In [None]:
from sklearn.datasets import make_blobs

centers = [[-6, 0], [0, 0], [-3, -5]]

X, y = make_blobs(n_samples=1650, 
                  centers=centers, 
                  random_state=5)

Let's get the number of instances per class. Below are our classes and their corresponding colors in the graph later.
- class `0`, violet
- class `1`, turquoise
- class `2`, yellow

In [None]:
X_0 = X[y == 0]
X_1 = X[y == 1]
X_2 = X[y == 2]

print('Number of class 0:', len(X_0))
print('Number of class 1:', len(X_1))
print('Number of class 2:', len(X_2))

There are 550 instances for each class.

Let's divide the dataset into train and test set. The test set will contain 50 instances for each class.

In [None]:
np.random.seed(10)

# Select 50 `class 0` instances
selected_0 = np.random.choice(np.arange(len(X_0)),
                              size=50,
                              replace=False)

# Select 50 `class 1` instances
selected_1 = np.random.choice(np.arange(len(X_1)),
                              size=50,
                              replace=False)

# Select 50 `class 2` instances
selected_2 = np.random.choice(np.arange(len(X_2)),
                              size=50,
                              replace=False)

# Form the test set
X_test = np.concatenate((X_0[selected_0],
                         X_1[selected_1],
                         X_2[selected_2]))
y_test = np.concatenate((np.array([0 for _ in range(50)]),
                         np.array([1 for _ in range(50)]),
                         np.array([2 for _ in range(50)])))

print(X_test.shape)
print(y_test.shape)

The remaining 1500 instances will be a part of the train set, where each class has 500 instances.

In [None]:
X_train = np.concatenate((np.delete(X_0, selected_0, 0),
                          np.delete(X_1, selected_1, 0),
                          np.delete(X_2, selected_2, 0)))
y_train = np.concatenate((np.array([0 for _ in range(500)]),
                          np.array([1 for _ in range(500)]),
                          np.array([2 for _ in range(500)])))

print(X_train.shape)
print(y_train.shape)

Visualize the train data.

In [None]:
plt.scatter(X_train[:, 0], X_train[:, 1], c=y_train)
plt.title('Train data')

Visualize the test data.

In [None]:
plt.scatter(X_test[:, 0], X_test[:, 1], c=y_test)
plt.title('Test data')

**Sanity check:** You should have a similar graph like our training data, but now with fewer points. The colors should appear in the same area, too.

Convert the `np.ndarray` arrays to `torch.Tensor`. We use `torch.Tensor` in PyTorch.

Convert the variable `X_train` to the datatype `torch.Tensor` and assign the return value to variable `X_train`.

In [None]:
# Write your code here


Convert the variable `y_train` to the datatype `torch.Tensor` and assign the return value to variable `y_train`.

In [None]:
# Write your code here


Convert the variable `X_test` to the datatype `torch.Tensor` and assign the return value to variable `X_test`.

In [None]:
# Write your code here


Convert the variable `y_test` to the datatype `torch.Tensor` and assign the return value to variable `y_test`.

In [None]:
# Write your code here


This is the data which we will feed into our neural network model.

# Neural Network

We will follow the pseudocode below:


1. Set up the size of our network.
2. Initialize weights variables.

start loop
3. Do forward propagation.
4. Get the predictions.
5. Calculate for the loss.
6. Do backward propagation to update/optimize the weight variables.

end loop

Open `neural_network.py` file. Some of the functions in the `NeuralNetwork` class are not yet implemented. We will implement the missing parts of this class.

Import the `NeuralNetwork` class.

In [None]:
from neural_network import NeuralNetwork

## Step 1. Setting up the size of the network

Instantiate a `NeuralNetwork` object. Set the following parameters:
- `list_hidden` = (5, 10)
- `activation` = `sigmoid`

Here, we are creating a Neural Network with two hidden layers, where there are 5 units in the first layer and 10 units in the second layer.

Set the other parameters according to the synthetic dataset that we created earlier.

In [None]:
# Write your code here


Open `neural_network.py` file and complete the `create_network()` function in the `NeuralNetwork` class. This function creates the layers of the neural network.

Implement the `create_network()` function  in the `NeuralNetwork` class. Inline comments should help you in completing the contents of the function.

Create the layers of the neural network by calling the function `create_network()`.

In [None]:
# Write your code here


Display the structure of the neural network.

In [None]:
print(network)

**Question #1:** Give the value of the `in_features` of the first `nn.Linear` module (Index 0).

<!--crumb;qna;Q1-->

A: 

**Question #2:** Give the value of the `out_features` of the last `nn.Linear` module (Index 4).

<!--crumb;qna;Q2-->

A: 

**Question #3:** Give the total number of parameters of the model.

<!--crumb;qna;Q3-->

A: 

## Step 2. Initializing the model weights

Open `neural_network.py` file and complete the `init_weights()` function in the `NeuralNetwork` class. This function initializes the weight of the network. Weights of a `nn.Linear` layer should be initialized from a normal distribution with mean `0` and standard deviation `0.1`. Bias terms of a `nn.Linear` layer should be initialized with a constant value of `0`.

Implement the `init_weights()` function  in the `NeuralNetwork` class. Inline comments should help you in completing the contents of the function.

Initialize the weights of the neural network by calling the function `init_weights()`.

In [None]:
# Write your code here


Display the weight of the 1st `nn.Linear` layer in the network.

In [None]:
print(network.layers[0].weight)

**Sanity check:** The output of the previous cell should look like:

```
Parameter containing:
tensor([[ x.xxxx, x.xxxx],
        ...
        [ x.xxxx, x.xxxx]], requires_grad=True)```
        
where `x.xxxx` is some float.

**Question #4:** Give the first value in the first row of the weights of the 1st `nn.Linear` layer in the network.

<!--crumb;qna;Q4-->

A: 

Display the bias term of the 1st `nn.Linear` layer in the network.

In [None]:
print(network.layers[0].bias)

**Sanity check:** The output of the previous cell should be:

```
Parameter containing:
tensor([0., 0., 0., 0., 0.], requires_grad=True)```

Display the weight of the 2nd `nn.Linear` layer in the network.

In [None]:
# Write your code here


**Question #5:** Give the last value in the last row of the weights of the 2nd `nn.Linear` layer in the network.

<!--crumb;qna;Q5-->

A: 

Display the bias term of the 2nd `nn.Linear` layer in the network.

In [None]:
# Write your code here


Display the weight of the 3rd `nn.Linear` layer in the network.

In [None]:
# Write your code here


**Question #6:** Give the first value in the first row of the weights of the 3rd `nn.Linear` layer in the network.

<!--crumb;qna;Q6-->

A: 

Display the bias term of the 3rd `nn.Linear` layer in the network.

In [None]:
# Write your code here


## Step 3. Forward propagation

Forward propagation computes the output of each layer in the neural network. 

Open `neural_network.py` file and complete the `forward_manual()` function in the `NeuralNetwork` class. This function performs the forward propagation of the model, implemented manually. You have to manually implement the computation of the output of each linear layer in the network.

Implement the `forward_manual()` function  in the `NeuralNetwork` class. Inline comments should help you in completing the contents of the function.

Perform forward propagation in the model by calling the function `forward_manual`. Pass the training instance in index `0`. Set `verbose=True`. Assign the return values to variables `scores` and `probabilities.`

The function should display the output of the model for in each layer.

In [None]:
# Write your code here


**Sanity check:** Let's call the pre-implemented `forward()` function of the model. This function also performs forward propagation, but using the operations defined in Pytorch modules. Thus, the output of our implementation of the `forward_manual()` function should be the same as the output of the `forward()` function below.

In [None]:
# Write your code here


**Question #7:** What is the sum of the output of the last layer of the network? Why did we get that sum?

<!--crumb;qna;Q7-->

A: 

## Step 4. Getting the predictions

Since this is a multinomial classification problem, the predicted class corresponds to the class with the highest probability.

Open `neural_network.py` file and complete the `predict()` function in the `NeuralNetwork` class. This function returns the index of the class with the highest probability.

Implement the `predict()` function  in the `NeuralNetwork` class. Inline comments should help you in completing the contents of the function.

Get 10 random training instances in the dataset.

In [None]:
np.random.seed(10)
random_indices = np.random.randint(X_train.shape[0], 
                                   size=10)
print('Random indices: ', random_indices)

Compute the scores and probabilities of the random training instances by calling the function `forward()`. Assign the return values to variables `scores` and `probabilities`.

In [None]:
# Write your code here


Print the raw scores of random instances in the dataset.

In [None]:
# Write your code here


Print the probabilities of random instances in the dataset.

In [None]:
# Write your code here


Get the predicted class by calling the function `predict()`. Store the predicted labels in the variable `predictions`.

In [None]:
# Write your code here


In [None]:
print(predictions)

**Sanity Check:** All selected training examples are currently being classified under one class.

**Question #8:** Obviously, the predicted class received the highest probability among the other classes. Why are all selected training examples being classified under one class?

<!--crumb;qna;Q8-->

A: 

## Step 5. Calculating the loss

Since this is a multinomial classification problem, we need to use cross entropy loss.

In PyTorch, we can use `nn.CrossEntropyLoss()` to calculate the cross entropy loss between the raw score output of the model and the target class. You may read the documentation [here](https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html#torch.nn.CrossEntropyLoss).

Instantiate a `nn.CrossEntropy()` object. Do not change any default values set as parameter. Assign it to the variable `criterion`.

In [None]:
# Write your code here


Get the target classes of the random training examples from the previous step. Convert the `torch.Tensor` to `torch.long`. This is because `nn.CrossEntropyLoss()` expects the target classes to be represented as a `long` and not `float`. Assign the return value to variable `target_classes`.

In [None]:
# Write your code here


Compute the loss and assign the return value to variable `loss`.

To note, the loss function accepts the score output of the model, not the probabilities. Read the documentation to understand values to pass for the `input` and the `target` parameters.

In [None]:
# Write your code here


Print the loss.

In [None]:
print('Loss: {:.4f}'.format(loss.item()))

**Question #9:** What is the loss in this scenario? Limit to 4 decimal places.

<!--crumb;qna;Q9-->

A: 

## Step 6: Backward propagation for optimizing weights

The next step would be to perform backward propagation to update the weights of the model. This will make the model better in classifying the input data. In PyTorch, we can call `backward()` function of the loss module to perform backward propagation. We also need to instantiate an optimizer to update the weights.

Read this [documentation](https://pytorch.org/docs/stable/optim.html#taking-an-optimization-step) to learn the general pseudocode for updating weights in PyTorch.

We will use Adam as our optimizer.

Instantiate an `optim.Adam` object. Set the following parameters:
- `params` = Set this to the parameters of your network
- `lr` = `0.001`

In [None]:
# Write your code here


Empty the gradients of the network.

In [None]:
# Write your code here


Compute the gradients through backward propagation.

In [None]:
# Write your code here


Update the weights.

In [None]:
# Write your code here


Display the weight of the 1st `nn.Linear` layer in the network after updating.

In [None]:
# Write your code here


**Sanity check:** The output of the previous cell should be different from the initial set of weights, which is:

```
Parameter containing:
tensor([[ x.xxxx, x.xxxx],
        ...
        [ x.xxxx, x.xxxx]], requires_grad=True)```
        
where `x.xxxx` is some float.

**Question #10:** What is the leftmost value in the weight tensor of the 1st `nn.Linear` layer in the network after updating? Limit to 4 decimal places.

<!--crumb;qna;Q10-->

A: 

## Putting all steps together

We will train the model using mini-batch gradient descent. 

Use the `data_loader.py` file that we implemented in the previous notebook. Import the `DataLoader` class.

In [None]:
from data_loader import DataLoader

Instantiate an `DataLoader` object. Pass the `X` and `y` of the train set and `32` as our `batch_size`. Assign it to the variable `data_loader`.

In [None]:
# Write your code here


Train your network. Complete the code below.

In [None]:
e = 0
max_epochs = 400
is_converged = False
previous_loss = 0
losses = []

# For each epoch
while e < max_epochs and is_converged is not True:
    
    current_epoch_loss = 0
    
    # TODO: Get the batch for this epoch.
    X_batch, y_batch = None
    
    # For each batch
    for X, y in zip(X_batch, y_batch):
        X = torch.Tensor(X)
        y = torch.Tensor(y).to(torch.long)
        
        # TODO: Empty the gradients of the network.
        
        
        # TODO: Forward propagation
        scores, probabilities = None
        
        # TODO: Compute the loss
        loss = None
        
        # TODO: Backward propagation
        
        
        # TODO: Update parameters
        
        
        current_epoch_loss += loss.item()
    
    average_loss = current_epoch_loss / len(X_batch)
    losses.append(average_loss)
    
    # Display the average loss per epoch
    print('Epoch:', e + 1, '\tLoss: {:.6f}'.format(average_loss))
    
    if abs(previous_loss - loss) < 0.00005:
        is_converged = True
    else:
        previous_loss = loss
        e += 1

**Question #11:** How many epochs did the model train before convergence?

<!--crumb;qna;Q11-->

A: 

**Question #12:** What is the average loss at the last epoch? Limit to 6 decimal places.

<!--crumb;qna;Q12-->

A: 

## Try our trained network on the test data

Set the network in test `eval` mode first, to avoid updating the weights.

In [None]:
network.eval()

Perform forward propagation on the test data. Assign the return values to variables `scores` and `probabilities`.

In [None]:
# Write your code here


Now, let's get the prediction results on the test data to see if our model can handle unseen instances. Store the predicted labels in the variable `predictions`.

In [None]:
# Write your code here


In [None]:
print(predictions)

Compare the ground truth labels with the predicted labels. Store the total number of correct predictions in the variable `num_correct`.

In [None]:
# Write your code here


In [None]:
print(num_correct)

Compute for the accuracy. Store the accuracy in the variable `accuracy`.

In [None]:
# Write your code here


In [None]:
print('{:.4f}'.format(accuracy))

**Question #13:** What is the accuracy of the network when evaluated on the test set? Express your answer in a floating point number from 0 to 1. Limit to 4 decimal places.

<!--crumb;qna;Q13-->

A: 

Let's visualize the loss for each training epoch.

In [None]:
x_values = [i for i in range(len(losses))]
y_values = losses

plt.plot(x_values, y_values)
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.title('Loss for each training epoch')

**Question #14:** Around what epoch did our training converge, i.e., when there are minimal changes in the value of the loss?

<!--crumb;qna;Q14-->

A: 

# Iris Dataset
We will use the Iris dataset as our dataset. Each instance represents an Iris flower using 4 distinct features:
- `sepal_length` - length of the sepal in centimeters
- `sepal_width` - width of the sepal in centimeters
- `petal_length` - length of the petal in centimeters
- `petal_width` - width of the petal in centimeters

Iris flowers can be 3 divided into different classes, which are:
- `Iris-setosa` - class `0`
- `Iris-versicolor` - class `1`
- `Iris-virginica` - class `2`

Let's load `Iris.csv`.

In [None]:
classes = {
    'Iris-setosa': 0,
    'Iris-versicolor': 1,
    'Iris-virginica': 2
}

with open('Iris.csv', 'r') as csv_file:
    raw_data = csv.reader(csv_file)
    X_iris = np.empty((0, 4), float)
    y_iris = np.empty((0, 1), int)
    for row in raw_data:
        X_iris = np.vstack([X_iris, np.array([float(row[0]),     # column for sepal_length
                                              float(row[1]),     # column for sepal_width
                                              float(row[2]),     # column for petal_length
                                              float(row[3])])])  # column for petal_width
        
        y_iris = np.append(y_iris, np.array([classes[row[4]]]))  # column for class

# This transforms the vector of length N into a matrix with shape (N, 1)
y_house = np.expand_dims(y_iris, 1) 

print('Training data shape:', X_iris.shape)
print('Ground truth values shape:', y_iris.shape)

Let's get the number of instances per class.

In [None]:
X_iris_0 = X_iris[y_iris == 0]
X_iris_1 = X_iris[y_iris == 1]
X_iris_2 = X_iris[y_iris == 2]

print('Number of class 0:', len(X_iris_0))
print('Number of class 1:', len(X_iris_1))
print('Number of class 2:', len(X_iris_2))

There are 50 instances for each class.

Let's divide the dataset into train and test set. The test set will contain 10 instances for each class.

In [None]:
np.random.seed(1)

# Select 10 `class 0` instances
selected_0 = np.random.choice(np.arange(len(X_iris_0)),
                              size=10,
                              replace=False)

# Select 10 `class 1` instances
selected_1 = np.random.choice(np.arange(len(X_iris_1)),
                              size=10,
                              replace=False)

# Select 10 `class 2` instances
selected_2 = np.random.choice(np.arange(len(X_iris_2)),
                              size=10,
                              replace=False)

# Form the test set
X_test = np.concatenate((X_iris_0[selected_0],
                         X_iris_1[selected_1],
                         X_iris_2[selected_2]))
y_test = np.concatenate((np.array([0 for _ in range(10)]),
                         np.array([1 for _ in range(10)]),
                         np.array([2 for _ in range(10)])))

print(X_test.shape)
print(y_test.shape)

The remaining 120 instances will be a part of the train set, where each class has 40 instances.

In [None]:
X_train = np.concatenate((np.delete(X_iris_0, selected_0, 0),
                          np.delete(X_iris_1, selected_1, 0),
                          np.delete(X_iris_2, selected_2, 0)))
y_train = np.concatenate((np.array([0 for _ in range(40)]),
                          np.array([1 for _ in range(40)]),
                          np.array([2 for _ in range(40)])))

print(X_train.shape)
print(y_train.shape)

Convert the `np.ndarray` arrays to `torch.Tensor`. We use `torch.Tensor` in PyTorch.

Convert the variable `X_train` to the datatype `torch.Tensor` and assign the return value to variable `X_train`.

In [None]:
# Write your code here


Convert the variable `y_train` to the datatype `torch.Tensor` and assign the return value to variable `y_train`.

In [None]:
# Write your code here


Convert the variable `X_test` to the datatype `torch.Tensor` and assign the return value to variable `X_test`.

In [None]:
# Write your code here


Convert the variable `y_test` to the datatype `torch.Tensor` and assign the return value to variable `y_test`.

In [None]:
# Write your code here


## Setting up the training pipeline

Set-up the following:
- Network
- Optimizer
- Loss function
- Data loader

Instantiate a `NeuralNetwork` object. Set the following parameters:
- `list_hidden` = (5, 10)
- `activation` = `sigmoid`

Here, we are creating a Neural Network with two hidden layers, where there are 5 units in the first layer and 10 units in the second layer.

Set the other parameters according to the Iris dataset.

In [None]:
# Write your code here


Create the network and initialize the weights.

In [None]:
# Write your code here


Display the structure of the neural network.

In [None]:
# Write your code here


**Question #15:** Give the value of the `in_features` of the first `nn.Linear` module (Index 0).

<!--crumb;qna;Q15-->

A: 

**Question #16:** Give the value of the `out_features` of the last `nn.Linear` module (Index 4).

<!--crumb;qna;Q16-->

A: 

We will use Adam as our optimizer.

Instantiate an `optim.Adam` object. Set the following parameters:
- `params` = Set this to the parameters of your network
- `lr` = `0.001`

In [None]:
# Write your code here


Instantiate a `nn.CrossEntropy()` object. Do not change any default values set as parameter. Assign it to the variable `criterion`.

In [None]:
# Write your code here


We will train the model using mini-batch gradient descent. 

Instantiate a `DataLoader` object. Pass the `X` and `y` of the train set and `32` as our `batch_size`. Assign it to the variable `data_loader`.

In [None]:
# Write your code here


## Training the network

Train your network. Complete the code below.

In [None]:
e = 0
max_epochs = 300
is_converged = False
previous_loss = 0
losses = []

# For each epoch
while e < max_epochs and is_converged is not True:
    
    current_epoch_loss = 0
    
    # TODO: Get the batch for this epoch.
    X_batch, y_batch = None
    
    # For each batch
    for X, y in zip(X_batch, y_batch):
        X = torch.Tensor(X)
        y = torch.Tensor(y).to(torch.long)
        
        # TODO: Empty the gradients of the network.
        
        
        # TODO: Forward propagation
        scores, probabilities = None
        
        # TODO: Compute the loss
        loss = None
        
        # TODO: Backward propagation
        
        
        # TODO: Update parameters
        
        
        current_epoch_loss += loss.item()
    
    average_loss = current_epoch_loss / len(X_batch)
    losses.append(average_loss)
    
    # Display the average loss per epoch
    print('Epoch:', e + 1, '\tLoss: {:.6f}'.format(average_loss))
    
    if abs(previous_loss - loss) < 0.00000005:
        is_converged = True
    else:
        previous_loss = loss
        e += 1

## Try our trained network on the test data

Set the network in test `eval` mode first, to avoid updating the weights.

In [None]:
network.eval()

Perform forward propagation on the test data. Assign the return values to variables `scores` and `probabilities`.

In [None]:
# Write your code here


Now, let's get the prediction results on the test data to see if our model can handle unseen instances. Store the predicted labels in the variable `predictions`.

In [None]:
# Write your code here


In [None]:
print(predictions)

Compare the ground truth labels with the predicted labels. Store the total number of correct predictions in the variable `num_correct`.

In [None]:
# Write your code here


In [None]:
print(num_correct)

Compute for the accuracy. Store the accuracy in the variable `accuracy`.

In [None]:
# Write your code here


In [None]:
print('{:.4f}'.format(accuracy))

**Question #17:** What is the accuracy of the network when evaluated on the test set? Express your answer in a floating point number from 0 to 1. Limit to 4 decimal places.

<!--crumb;qna;Q17-->

A: 

### <center>fin</center>