# 1. Neural Networks
This notebook has mixed types of theoretical and code implementation questions on multilayer perceptron and neural network training.

In [11]:
import math
import pickle
import gzip
import numpy as np
import pandas
import matplotlib.pylab as plt
import pytest
%matplotlib inline

Problem 1 - Single-Layer and Multilayer Perceptron Learning
---

**Part A** : Consider learning the following concepts with either a single-layer or multilayer perceptron where all hidden and output neurons utilize *indicator* activation functions. For each of the following concepts, state whether the concept can be learned by a single-layer perceptron. Briefly justify your response by providing weights and biases as applicable:

i. $~ \texttt{ NOT } x_1$

ii. $~~x_1 \texttt{ NOR } x_2$

iii. $~~x_1 \texttt{ XNOR } x_2$ (output 1 when $x_1 = x_2$ and 0 otherwise)

**Part B** : Determine an architecture and specific values of the weights and biases in a single-layer or multilayer perceptron with *indicator* activation functions that can learn $x_1 \texttt{ XNOR } x_2$. <br>
In this week's Peer Review, describe your architecture and state your weight matrices and bias vectors. 

Then demonstrate that your solution is correct by implementing forward propagation for your network in Python and showing that it correctly produces the correct boolean output values for each of the four possible combinations of $x_1$ and $x_2$. <br>
By reading [Neural Representation of AND, OR, NOT, XOR and XNOR Logic Gates](https://medium.com/@stanleydukor/neural-representation-of-and-or-not-xor-and-xnor-logic-gates-perceptron-algorithm-b0275375fea1)

By considering the following thruth tabel <br>
![XNOR truth tabel](https://miro.medium.com/v2/resize:fit:598/0*oGu2x1DA9soE3IdO.gif) <br>
And the following neural network <br>
![XNOR neural network](https://miro.medium.com/v2/resize:fit:640/format:webp/1*yZfw_9DRMephzZwejjhyTA.png)

In [12]:
# implement forward propagation for network
# show that it correctly produces the correct boolean output values 
# for each of the four possible combinations of x1 and x2 

def neuron(X:np.array, W:np.array, b:float):
    return activation(X@W + b)
# Initialize x with the 4 possible combinations of 0 and 1 to generate 4 values for y(output)
def activation(x):
    return 1 if x>0 else 0
def and_neuron(X:np.array):
    b = -1
    W = np.array([1,1])
    return neuron(X,W,b)
def not_neuron(X):
    W = np.array([-1])
    X = np.array([X])
    return neuron(X,W,1)
def nor_neuron(X:np.array):
    W = np.array([-1,-1])
    b = 1
    return neuron(X,W,b)
def xor_neuron(X:np.array):
    x1 = and_neuron(X)
    x2 = nor_neuron(X)
    return nor_neuron(np.array([x1,x2]))
def convert_to_array(i):
    raw = [int(j) for j in f"{i:02b}"]
    return np.array(raw)


combinations = [ convert_to_array(i) for i in range(4)]
table_format = "| {:1} | {:1} | nor({:3},{:3}) => {:3}"
header = table_format.format("a","b", "and", "nor", "xor")
print(header)
for i in combinations:
    print(table_format.format(i[0], i[1], and_neuron(i), nor_neuron(i), xor_neuron(i)))



| a | b | nor(and,nor) => xor
| 0 | 0 | nor(  0,  1) =>   0
| 0 | 1 | nor(  0,  0) =>   1
| 1 | 0 | nor(  0,  0) =>   1
| 1 | 1 | nor(  1,  0) =>   0


Problem 2 - Back propagation
---

In this problem you'll gain some intuition about why training deep neural networks can be very time consuming.  Consider training the chain-like neural network seen below: 

![chain-like nn](figs/chain_net.png)

Note that this network has three weights $W^1, W^2, W^3$ and three biases $b^1, b^2,$ and $b^3$ (for this problem you can think of each parameter as a single value or as a $1 \times 1$ matrix). Suppose that each hidden and output neuron is equipped with a sigmoid activation function and the loss function is given by 

$$
\ell(y, a^4) = \frac{1}{2}(y - a^4)^2  
$$

where $a^4$ is the value of the activation at the output neuron and $y \in \{0,1\}$ is the true label associated with the training example. 

**Part A**: Suppose each of the weights is initialized to $W^k = 1.0$ and each bias is initialized to $b^k = -0.5$.  Use forward propagation to find the activities and activations associated with each hidden and output neuron for the training example $(x, y) = (0.5,0)$. Show your work. Answer the Peer Review question about this section.

**Part B**: Use Back-Propagation to compute the weight and bias derivatives $\partial \ell / \partial W^k$ and $\partial \ell / \partial b^k$ for $k=1, 2, 3$.  Show all work. Answer the Peer Review question about this section. 

**PART C** Implement following activation functions:

Formulas for activation functions

* Relu: f($x$) = max(0, $x$)
<br><br>

* Sigmoid: f($x$) = $\frac{1}{1 + e^{-x}}$
<br><br>

* Softmax: f($x_i$) = $\frac{e^x_i}{\sum_{j=1}^{n} e^{x_j}}$

In [13]:
def relu(x):
    return max(0,x)
    

def sigmoid(x):
    return 1/(1+math.exp(-x))

def soft_max(x):
    x_exp = [math.exp(xi) for xi in x]
    den = sum(x_exp)
    return [xei/den for xei in x_exp]

In [14]:
# Activation function tests
# PLEASE NOTE: These sample tests are only indicative and are added to help you debug your code
# and there are additional hidden test cases on which your notebook will be evaluated upon submission

# Test Relu function
assert int(relu(-6.5)) == 0, "Check relu function"

# Test Sigmoid function
assert pytest.approx(sigmoid(0.3), 0.00001) == 0.574442516811659, "Check sigmoid function"

# Test Softmax function
assert pytest.approx(soft_max([5,7]), 0.00001) == [0.11920292, 0.88079708], "Check softmax function"

**PART D** Implement the following Loss functions:

Formulas for activation functions

* Mean squared error <br>
Formula: MSE = (1/n) * Σ(yi - ŷi)^2

* Mean absolute error <br>
Formula: MAE = (1/n) * Σ|yi - ŷi|

* Hinge Loss <br>
Formula: L = max(0, 1 - yi * ŷi)

In [32]:
def mean_squared_error(yhat,y):
    n = len(y)
    dif_square = [(yi-yhi)**2 for yi, yhi in zip(y,yhat)]
    return (1/n)*sum(dif_square)
    
def mean_absolute_error(yhat,y):
    n = len(y)
    dif_abs = [abs(yi-yhi) for yi, yhi in zip(y,yhat)]
    return (1/n)*sum(dif_abs)
    

def hinge(yhat,y):
    li = [max(0,1-yi*yhi) for yi, yhi in zip(y,yhat) ]
    return sum(li)/len(li)

In [33]:
# Error function tests
# PLEASE NOTE: These sample tests are only indicative and are added to help you debug your code
# and there are additional hidden test cases on which your notebook will be evaluated upon submission

y_true = np.array([2, 3, -0.45])
y_pred = np.array([1.5, 3, 0.2])

# Test mean squared error function
assert pytest.approx(mean_squared_error(y_pred,y_true), 0.00001) == 0.2241666666666667, "Check mean_squared_error function"

# Test mean absolute error function
assert pytest.approx(mean_absolute_error(y_pred,y_true), 0.00001) == 0.3833333333333333, "Check mean_absolute_error function"

# Test hinge loss function
assert pytest.approx(hinge(y_pred,y_true), 0.00001) == 0.36333333333333334, "Check hinge loss function"
