# Atul's super simple neural net

_October 8, 2017_

This workbook contains an implementation of a ridiculously simple neural net with one hidden layer that has two units, and one unit in the output layer. The sigmoid activation function is used in both layers.

In [1]:
import numpy as np
from numpy.testing import assert_almost_equal

First, let's define the sigmoid function and its gradient...

In [2]:
def sigmoid(x):
    return 1 / (1 + np.power(np.e, -x))

# Sanity check.
assert sigmoid(0) == 0.5
assert_almost_equal(sigmoid(-100), 0)
assert_almost_equal(sigmoid(100), 1)

def sigmoid_gradient(x):
    return sigmoid(x) * (1 - sigmoid(x))

# Sanity check.
assert_almost_equal(sigmoid_gradient(-100), 0)
assert_almost_equal(sigmoid_gradient(100), 0)

Now let's define a function that runs forward propagation on a neural net, given the weights and biases for both layers.

In implementing this, I found Coursera's deeplearning.ai course on [Neural Networks and Deep Learning](https://www.coursera.org/learn/neural-networks-deep-learning) useful; all notation is generally taken from that class.

In [3]:
def forward_prop_nn(w1, b1, w2, b2, x):
    z1 = np.dot(w1, x) + b1
    a1 = sigmoid(z1)
    z2 = np.dot(w2, a1) + b2
    a2 = sigmoid(z2)

    return (z1, a1, z2, a2)

def predict_nn(w1, b1, w2, b2, x):
    return forward_prop_nn(w1, b1, w2, b2, x)[-1]

Here we will manually create a neural net to run the XNOR boolean operation. XNOR is just the negation of XOR, i.e. it will be true if _both_ its inputs are true or false, and false otherwise.

In [4]:
xnor_w_1 = np.array([
    [-20, -20],          # Weights for "(NOT x[0]) AND (NOT x[1])" 
    [ 20,  20],          # Weights for "x[0] AND x[1]"
]) * 10
xnor_b_1 = np.array([
    [ 10],               # Bias for "(NOT x[0]) AND (NOT x[1])"
    [-30],               # Bias for "x[0] AND x[1]"
]) * 10
xnor_w_2 = np.array([
    [ 20,  20],          # Weights for "x[0] OR x[1]"
]) * 10
xnor_b_2 = np.array([
    [-10],               # Bias for "x[0] OR x[1]"
]) * 10

Now let's make sure our manually-constructed NN matches our intuitive expectations of XNOR.

In [5]:
# Define a "truth table" for our XNOR function. We'll use this to make sure our NN
# works, and we'll also use it later as training data.
boolean_xnor_truth_table = [
    # x[0]   x[1]    y
    [(True , True ), True ],
    [(False, False), True ],
    [(False, True ), False],
    [(True , False), False]
]

# This is a numpy-friendly version of our truth table, where each item is
# a tuple consisting of a 2x1 array representing the input (x) and a 1x1
# array representing the output (y).
xnor_truth_table = [
    (np.array(x, dtype=float).reshape(2, 1),
     np.array([[y]], dtype=float))
    for (x, y) in boolean_xnor_truth_table
]

# Test our NN to make sure everything works.
for x, y in xnor_truth_table:
    assert_almost_equal(predict_nn(xnor_w_1, xnor_b_1, xnor_w_2, xnor_b_2, x), y)

Now let's define a function to train a neural net!

This is intentionally un-vectorized because I wanted to make sure I understood the algorithm before dealing with vectorization. Thus the `examples` parameter is just a Python list of tuples containing a 2x1 numpy array and an expected 1x1 output.

The Coursera class' [Backpropagation intuition](https://www.coursera.org/learn/neural-networks-deep-learning/lecture/6dDj7/backpropagation-intuition-optional) lecture was particularly helpful in understanding the math behind this. I supplemented my understanding with [Khan Academy's AP Calculus AB](https://www.khanacademy.org/math/ap-calculus-ab) when needed because I am very rusty at Calculus.

In [6]:
def cost_func(predicted_y, y):
    return -y * np.log(predicted_y) - (1 - y) * np.log(1 - predicted_y)

def train_nn(examples, iterations, learning_rate, check_gradient=None, print_cost=True):
    m = len(examples)
    np.random.seed(1)
    
    # Initialize our weights and biases. Note that the weights need to
    # be randomly initialized so we can break symmetry.
    w1 = np.random.rand(2, 2)
    b1 = np.zeros([2, 1])
    w2 = np.random.rand(1, 2)
    b2 = np.zeros([1, 1])

    for i in range(iterations):
        dw1 = np.zeros([2, 2])
        db1 = np.zeros([2, 1])
        dw2 = np.zeros([1, 2])
        db2 = np.zeros([1, 1])
        cost = np.zeros([1, 1])
        for x, y in examples:
            # Forward propagation.
            z1, a1, z2, a2 = forward_prop_nn(w1, b1, w2, b2, x)

            # Calculate the cost of our output by comparing it to the
            # expected output.
            cost += cost_func(a2, y)

            # Back propagation.
            dz2 = a2 - y
            dw2 += np.dot(dz2, a1.T)
            db2 += dz2
            dz1 = np.dot(w2.T, dz2) * sigmoid_gradient(z1)
            dw1 += np.dot(dz1, x.T)
            db1 += dz1
        dw1 /= m
        db1 /= m
        dw2 /= m
        db2 /= m
        cost /= m

        if check_gradient is not None:
            check_gradient(w1, b1, w2, b2, examples, dw1, db1, dw2, db2)
        
        w1 -= learning_rate * dw1
        b1 -= learning_rate * db1
        w2 -= learning_rate * dw2
        b2 -= learning_rate * db2
        if i % 100 == 0 and print_cost:
            print(f"cost at iteration {i}: {cost[0][0]}")
    return (w1, b1, w2, b2)


Now let's train a neural net to learn the XNOR operation.

This is obviously a stupid use of a neural net, but I wanted a trivial use case to make sure I understood how things work.

In [7]:
print("Training neural net...\n")

# Reuse our truth table as our training data.
w1, b1, w2, b2 = train_nn(xnor_truth_table, 5000, 1)

# Test our NN to make sure it produces the same responses as our truth table.
# Note that normally a NN classifier would use some sort of thresholding
# to determine whether its outputs are true or false, but here we'll just
# directly compare its output to the expected truth table value to two
# decimal places, because our NN happens to be that awesome.
print(f"\nTraining complete. Verifying predictions...\n")
for x, y in xnor_truth_table:
    print(f"{x} should be approximately {float(y)}...")
    y_hat = predict_nn(w1, b1, w2, b2, x)
    assert_almost_equal(y_hat, y, decimal=2)
    print(f"  ✓ Prediction is {float(y_hat)}, hooray!\n")


Training neural net...

cost at iteration 0: 0.6960558242178866
cost at iteration 100: 0.6929867887355738
cost at iteration 200: 0.6922462617773397
cost at iteration 300: 0.6865890109283816
cost at iteration 400: 0.6374850000789614
cost at iteration 500: 0.5063366329094551
cost at iteration 600: 0.24494224360510392
cost at iteration 700: 0.0915469133772983
cost at iteration 800: 0.05089111333463649
cost at iteration 900: 0.034477698445246695
cost at iteration 1000: 0.025856753007441634
cost at iteration 1100: 0.020601465948381907
cost at iteration 1200: 0.017082139409163984
cost at iteration 1300: 0.014568584508727361
cost at iteration 1400: 0.012687400482552087
cost at iteration 1500: 0.011228647852513835
cost at iteration 1600: 0.010065565651741971
cost at iteration 1700: 0.009117240056884511
cost at iteration 1800: 0.008329668454476488
cost at iteration 1900: 0.007665468228175891
cost at iteration 2000: 0.007097960769063973
cost at iteration 2100: 0.00660760840370299
cost at iterati

Hooray! But how do we know our gradient descent math is correct?

We can figure this out with gradient checking!

In [8]:
def check_gradient(w1, b1, w2, b2, examples, check_dw1, check_db1, check_dw2, check_db2):
    epsilon = 0.0001
    m = len(examples)

    theta = np.concatenate((
        w1.reshape(-1, 1),
        b1.reshape(-1, 1),
        w2.reshape(-1, 1),
        b2.reshape(-1, 1),
    ))
    costs_left = np.zeros([len(theta), 1])
    costs_right = np.copy(costs_left)
    
    def unrolled_predict_nn(theta, x):
        w1 = theta[0:4].reshape(2, 2)
        b2 = theta[4:6].reshape(2, 1)
        w2 = theta[6:8].reshape(1, 2)
        b2 = theta[8:9].reshape(1, 1)
        return predict_nn(w1, b1, w2, b2, x)

    for x, y in examples:
        for i in range(len(theta)):
            theta_left = np.copy(theta)
            theta_left[i] -= epsilon
            theta_right = np.copy(theta)
            theta_right[i] += epsilon
            costs_left[i] += cost_func(unrolled_predict_nn(theta_left, x), y)[0]
            costs_right[i] += cost_func(unrolled_predict_nn(theta_right, x), y)[0]

    costs_left /= m
    costs_right /= m
    theta_prime = (costs_right - costs_left) / (2 * epsilon)
    dw1 = theta_prime[0:4].reshape(2, 2)
    db1 = theta_prime[4:6].reshape(2, 1)
    dw2 = theta_prime[6:8].reshape(1, 2)
    db2 = theta_prime[8:9].reshape(1, 1)
    assert_almost_equal(dw1, check_dw1)
    assert_almost_equal(db1, check_db1, decimal=2)  # TODO: Why the disparity here?
    assert_almost_equal(dw2, check_dw2)
    assert_almost_equal(db2, check_db2)

train_nn(xnor_truth_table, 10, 1, check_gradient=check_gradient, print_cost=False)

print("Gradients check out OK!")

Gradients check out OK!
