## HWD recognizer using math and numpy
NN two-layer architecture:

* 784 inputs with each pixel representing its intensity
* 10 units in the hidden later with ReLU activation function
* 10 units in the output layer corresponding to one of ten digit classes using softmax activation function

Sigmoid activation function is avoided due to the vanishing gradient problem
Using ReLU in terms of it not activating all the neurons at the same time. The neurons will be deactivated only if the output of linear transformation < 0. (Computationally efficient compared to tahn/sigmoid)

In [2]:
import numpy as np
import pandas as pd

In [3]:
data = pd.read_csv('/kaggle/input/digit-recognizer/train.csv')
data.head()

Unnamed: 0,label,pixel0,pixel1,pixel2,pixel3,pixel4,pixel5,pixel6,pixel7,pixel8,...,pixel774,pixel775,pixel776,pixel777,pixel778,pixel779,pixel780,pixel781,pixel782,pixel783
0,1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,4,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [36]:
data = np.array(data)
np.random.shuffle(data)
m, n = data.shape

test_data = data[0:1000].T
Y_test = test_data[0]
X_test = test_data[1:n]
X_test = X_test / 255.

train_data = data[1000:m].T
Y_train = train_data[0]
X_train = train_data[1:n]
X_train = X_train / 255.


_,m_train = X_train.shape


In [7]:
Y_test[:10]

array([4, 7, 9, 3, 0, 2, 5, 6, 1, 1])

In [19]:
def initializing():
    W1 = np.random.rand(10, 784) - 0.5
    b1 = np.random.rand(10, 1) - 0.5
    W2 = np.random.rand(10, 10) - 0.5
    b2 = np.random.rand(10, 1) - 0.5
    return W1, b1, W2, b2

In [9]:
def ReLU(Z):
    return np.maximum(0, Z)

In [30]:
def ReLU_derivative(Z):
    return Z > 0

In [38]:
def softmax(Z):
    return np.exp(Z) / sum(np.exp(Z))

In [21]:
def forward_prop(W1, b1, W2, b2, X):
    Z1 = W1.dot(X) + b1
    A1 = ReLU(Z1)
    Z2 = W2.dot(A1) + b2
    A2 = softmax(Z2)
    return Z1, A1, Z2, A2

In [28]:
def one_hot_encoding(Y):
    one_hot = np.zeros((Y.size, Y.max() + 1))
    one_hot[np.arange(Y.size), Y] = 1
    return one_hot.T

In [31]:
def backward_propagation(Z1, A1, Z2, A2,W1, W2, X, Y):
    one_hot = one_hot_encoding(Y)
    dz2 = A2 - one_hot
    dW2 = 1 / m * dz2.dot(A1.T)
    db2 = 1 / m * np.sum(dz2)
    dz1= W2.T.dot(dz2) *ReLU_derivative(Z1)
    dW1 = 1 / m * dz1.dot(X.T)
    db1 = 1 / m * np.sum(dz1)
    return dW1, db1, dW2, db2

def update_params(W1, b1, W2, b2, dW1, db1, dW2, db2, alpha):
    W1 = W1 - alpha * dW1
    b1 = b1 - alpha * db1    
    W2 = W2 - alpha * dW2  
    b2 = b2 - alpha * db2    
    return W1, b1, W2, b2

In [23]:
def get_predictions(A2):
    return np.argmax(A2, 0)

def get_accuracy(predictions, Y):
    print(predictions, Y)
    return np.sum(predictions == Y) / Y.size

def gradient_descent(X, Y, alpha, iterations):
    W1, b1, W2, b2 = initializing()
    for i in range(iterations):
        Z1, A1, Z2, A2 = forward_prop(W1, b1, W2, b2, X)
        dW1, db1, dW2, db2 = backward_propagation(Z1, A1, Z2, A2, W1, W2, X, Y)
        W1, b1, W2, b2 = update_params(W1, b1, W2, b2, dW1, db1, dW2, db2, alpha)
        if i % 10 == 0:
            print("Iteration: ", i)
            predictions = get_predictions(A2)
            print(get_accuracy(predictions, Y))
    return W1, b1, W2, b2

In [39]:
W1, b1, W2, b2 = gradient_descent(X_train, Y_train, 0.10, 500)

Iteration:  0
[1 4 4 ... 1 1 1] [2 9 9 ... 3 7 5]
0.10697560975609756
Iteration:  10
[2 4 4 ... 2 1 2] [2 9 9 ... 3 7 5]
0.15460975609756097
Iteration:  20
[2 4 4 ... 2 1 2] [2 9 9 ... 3 7 5]
0.2075121951219512
Iteration:  30
[2 4 4 ... 2 1 2] [2 9 9 ... 3 7 5]
0.27431707317073173
Iteration:  40
[2 4 9 ... 2 1 2] [2 9 9 ... 3 7 5]
0.33990243902439027
Iteration:  50
[2 9 9 ... 2 1 2] [2 9 9 ... 3 7 5]
0.3999268292682927
Iteration:  60
[2 9 9 ... 3 1 2] [2 9 9 ... 3 7 5]
0.44458536585365854
Iteration:  70
[2 9 9 ... 3 1 2] [2 9 9 ... 3 7 5]
0.48448780487804877
Iteration:  80
[2 9 9 ... 3 1 2] [2 9 9 ... 3 7 5]
0.5217317073170732
Iteration:  90
[2 9 9 ... 3 1 3] [2 9 9 ... 3 7 5]
0.5566585365853659
Iteration:  100
[2 9 9 ... 3 7 3] [2 9 9 ... 3 7 5]
0.5856829268292683
Iteration:  110
[2 9 9 ... 3 7 3] [2 9 9 ... 3 7 5]
0.6116829268292683
Iteration:  120
[2 9 9 ... 3 7 3] [2 9 9 ... 3 7 5]
0.6349756097560976
Iteration:  130
[2 9 9 ... 3 7 3] [2 9 9 ... 3 7 5]
0.6551463414634147
Iteration: 