# Neural Network - Backpropagation

## Terminology

$ s_l $ : no of nodes in layer $ l $  
$ a^{(l)} $ : activation nodes for layer $ l $ , has dimension $ [m \times (s_{l}+1)] $ including the bias unit  
$ \Theta^{(l)} $ : weights for layer $ l $, has dimension $ [ (s_{l}+1) \times s_{l+1} ] $  
$ K $ : No of Output Units  
$ L $ : No of Layers

$ a^{(1)} = X = $ input layer - $ [m \times (n+1)] $   
$ \Theta^{(1)} $ - $ [(n+1) \times s_2] $

$ a^{(2)} = g(a^{(1)}. \Theta^{(1)} ) $ - $  [m \times (s_2)] $  ( add ($ a^{(2)}_0 $) )  
and so on...





## Cost Function
$$ J(\Theta) = - \frac{1}{m} [ \sum^{m}_{i=1} \sum^{K}_{k = 1} y_{i k} log(h_{\theta} (x_i)_k ) + (1- y_{i k}) log(1- h_{\theta} (x_i)_k )] + \frac{\lambda}{2m} \sum^{L-1}_{l=1} \sum^{s_l}_{i=1} \sum^{s_{l+1}}_{j=1} (\Theta^{(l)}_{ij})^2  $$

## Gradient Function
For $ L = 4$ $, s_1 = 3$, $s_2 = 5$, $s_3 = 5$, $s_4 = K = 4 $  

$ \delta^{(l)}_{j} :$ "error" in the activation of node $ j$ in layer $l$  

$ \delta^{(4)} = a^{(4)} - y $ - has dimensions $[m \times s_4]$  

$ \delta^{(3)} =  \delta^{(4)} (\Theta^{(3)})^T .* g'(z^{(3)}) $ - has dimensions $[m \times s_4]$  
where, $ g'(z^{(3)}) = g(z^{(3)}) .* (1 - g(z^{(3)})) $  

Therefore,  
$ \delta^{(3)} =  \delta^{(4)} (\Theta^{(3)})^T .* a^{(3)} .* (1 - a^{(3)} )  $  

and so on...

In [15]:
import numpy as np
import scipy.optimize as op
import scipy.io as sio
import matplotlib.pyplot as plt

In [16]:
data = sio.loadmat('Practice\Machine Learning\machine-learning-ex3\ex3\ex3data1.mat')
data['X'] = np.insert(arr=data['X'], obj=0, values=1.0, axis=1)
X = data['X']
y = data['y']
m, n = X.shape
K = 10
Y = np.zeros((m, 10))
# for predicting digit = 0, we get h high as index 9 (0 based)
# therefore we create Y as
# [1, 0, 0, ... 0] for 1
# [0, 1, 0, ... 0] for 2
# [0, 0, 0, ... 1] for 0
for i in range(1, K+1):
    Y[np.where(y == i)[0], i-1] = 1


In [17]:
def sigmoid(Z):
    return 1/(1+ 1/(np.e**Z))

In [18]:
# np.where( H == np.amax(H, axis=1).reshape((m, 1)) )[1]
# thetas = 3d matrix where thetas[0] = theta matrix for layer 1
# y = m x 10 shaped matrix
def costFunc(thetas, X, Y, lbd, K):
    m, n = X.shape
    Y = Y.reshape((m, K))
    theta1 = thetas[0] #.reshape((n, 25))
    Z2 = X.dot(theta1) # M x 1
    A2 = sigmoid(Z2)
    theta2 = thetas[1] #.reshape((26, 10))
    A2 = np.insert(arr=A2, obj=0, values=1.0, axis=1)
    Z3 = A2.dot(theta2)
    A3 = sigmoid(Z3)    # M X 10
    cost = -1/m*np.sum(Y*np.log(A3)+(1-Y)*np.log(1-A3))
    cost += lbd /(2*m) * (np.sum(theta1[1:, :]**2) + np.sum(theta2[1:, :]**2) )
    return cost

In [19]:
test_thetas = sio.loadmat('Practice\Machine Learning\machine-learning-ex4\ex4\ex4weights.mat')
tt1 = test_thetas['Theta1'].T
tt2 = test_thetas['Theta2'].T
thetas = np.array([tt1, tt2])
lbd = 1
cost = costFunc(thetas, X, Y, lbd, K)
cost

0.38376985909092365