# Workshop: Build Your Own Artificial Neural Network (ANN)

---
Featuring Tensorflow (TFlow).

We'll be going classifying MNIST data, which is a set of ~70,000 images of handwritten digits. Bear in mind, this is a solved problem, so we're not doing anything novel.

---
**What you should leave with:**
You should leave here with a practical understanding of how to implement an Artificial Neural Network (ANN) from nothing. The concepts don't change when you move to different domains, simply the way in which you apply them. Your understanding of the *central* concept of ANNs, **backpropagation (backprop)** should be well founded and given some more practice, you could explain this to a friend.

You should also leave here with a minimal understanding TensorFlow and how using such a library can speed up your model development, as well as understanding some of it's drawbacks.

### Contents:
1. [Some Pre-processing](#1.-Some-Pre-processing)
2. [Building an ANN from Sratch](#2.-Building-an-ANN-from-Sratch)
3. [Rebuilding the ANN in TensorFlow](#3.-Rebuilding-the-ANN-in-TensorFlow)

---
## 1. Some Pre-processing

In [1]:
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("data/", one_hot=True)

Extracting data/train-images-idx3-ubyte.gz
Extracting data/train-labels-idx1-ubyte.gz
Extracting data/t10k-images-idx3-ubyte.gz
Extracting data/t10k-labels-idx1-ubyte.gz


In [2]:
import gzip

def convert(imgf, labelf, outf, n):
    f = gzip.open(imgf, "rb")
    o = open(outf, "w")
    l = gzip.open(labelf, "rb")

    f.read(16)
    l.read(8)
    images = []

    for i in range(n):
        image = [ord(l.read(1))]
        for j in range(28*28):
            image.append(ord(f.read(1)))
        images.append(image)

    for image in images:
        o.write(",".join(str(pix) for pix in image)+"\n")
    f.close()
    o.close()
    l.close()

convert("data/train-images-idx3-ubyte.gz", "data/train-labels-idx1-ubyte.gz", "data/mnist_train.csv", 60000)
convert("data/t10k-images-idx3-ubyte.gz",  "data/t10k-labels-idx1-ubyte.gz",  "data/mnist_test.csv",  10000)

---

## 2. Building an ANN from Sratch

We're going to build an Artificial Neural Network (ANN) from the ground-up, using raw [Python](http://python.org/), [NumPy](http://numpy.org/), and [ScipPy](http://scipy.org/).

We'll build the ANN from the group up to give you the intuition behind how one would go about creating an ANN. These **_can_** run faster than the ANN built with TensorFlow, Torch, or other libraries, but these libraries introduce simplicity of building (which you'll see later).

---

### On to the workshop.

Let's import our dependencies, first.

In [3]:
import numpy as np
from scipy.special import expit

We're going to build a ANN class, called `NeuralNetwork`, this will contain two functions, and an initializer.

The functions are: `train(...)` and `query(...)`. The `...` is because we don't necessarily know what we should be passing through to these functions.

**NOTE:** We're going to build the functions that go into the class, so we take things a step as a time, and so this commentary can be there in between. Once we've built the functions, we'll copy-paste them into the class definition and run with it from there.

### 2.1 Building the Initializer: `__init__(...)`

The `__init__(...)` is almost like a constructor. Essentially, we use this to setup some instance variables that enable us to avoid passing the ANN's configuration to each function we call.

This function should have a few variables that keep track of and add to the class:
- the number of input nodes
- the number of hidden nodes
- the number of output nodes
- the learning rate
- the weights from the input to hidden layers
- the weights from the hidden to output layers
- the activation function

In [4]:
def __init__():
    pass

### BEGIN SOLUTION

def __init__(self, n_inodes, n_hnodes, n_onodes, learn_rate):
    ## these determine the number of nodes per layer
    self.i_cnt = n_inodes
    self.h_cnt = n_hnodes
    self.o_cnt = n_onodes
    
    ## specify the learning rate
    self.lr = learn_rate
    
    
    ## weight initialization
    ## this can be done in one of two ways;
    ## 1. we can randomly do so, then shift by 0.5 to 0-center our weights to introduce some negativity
    ## 2. we can pull from a normal distribution based on some rather well established rationales
    
    ## going with option 1:
    self.w_i2h = np.random.rand(self.h_cnt, self.i_cnt) - 0.5
    self.w_h2o = np.random.rand(self.o_cnt, self.h_cnt) - 0.5
    
    ## going with option 2:
    self.w_i2h = np.random.normal(0, pow(self.h_cnt, -0.5), (self.h_cnt, self.i_cnt))
    self.w_h2o = np.random.normal(0, pow(self.o_cnt, -0.5), (self.o_cnt, self.h_cnt))
    
    
    ## we can now specify the activation function, we'll do so as a lambda function
    self.activation = lambda x: expit(x)
    
### END SOLUTION

### 2.2 Building the Query Function: `query(...)`

The `query(...)` function should enable us to talk to the ANN and ask it to classify some images we hand it.

We write this function before `train(...)` because it's less complex in nature because it equates to a forward pass through the network. This should help us ground the ideas that we should be implementing into `train(...)` when we get there.

In [5]:
def query():
    pass

### BEGIN SOLUTION

def query(self, input_list):
    ## convert input list to np.array and transpose because of matrix mult
    inputs = np.array(input_list, ndmin=2).T
    
    
    ## propagate the input through the hidden layer
    ### recall at X_{hidden} = W_{input_hidden} * I_{inputs}
    hidden_in  = np.dot(self.w_i2h, inputs)
    ### pass `hidden_in` through the activation function to calculate the output
    hidden_out = self.activation(hidden_in)
    
    
    ## propagate the hidden output through the ouput layer
    ### recall at X_{output} = W_{hidden_output} * I_{hidden_out}
    output_in  = np.dot(self.w_h2o, hidden_out)
    ### pass `output_in` through the activation function to calculate the output
    output_out = self.activation(output_in)
    
    return output_out
    
### END SOLUTION

### 2.3 Building the Train Function: `train(...)`

The `train(...)` function is how the ANN learns. We'll hand it our dataset with the labels for it to validate itself on by completing forward passes and updating the weights through backprop.

We'll need to hand this function:
- our inputs
- our expected values

Now that we've given our network data to train on, we need to implement the forward pass, followed by the backward pass. **Recall** that the backward pass involves a few stages. First, we need to calculate the output error, then distribute that error backwards across the network. This will update our weights, but the update will be moderated by the learning rate, which we specified earlier in `__init__()`.

In [6]:
def train():
    pass

### BEGIN SOLUTION

def train(self, input_list, target_list):
    ## convert input list to np.array and transpose because of matrix mult
    inputs  = np.array(input_list, ndmin=2).T
    targets = np.array(target_list, ndmin=2).T
    
    
    hidden_in  = np.dot(self.w_i2h, inputs)
    hidden_out = self.activation(hidden_in)
    
    output_in  = np.dot(self.w_h2o, hidden_out)
    output_out = self.activation(output_in)
    
    output_err = (target - output_out)
    
    hidden_err = np.dot(self.w_h2o.T, output_err)
    
    self.w_h2o += self.lr * np.dot((output_err * output_out * (1 - output_out)), np.transpose(hidden_out))
    self.w_i2h += self.lr * np.dot((hidden_err * hidden_out * (1 - hidden_out)), np.transpose(inputs))
    
### END SOLUTION

### 2.4 Assembling the `NeuralNetwork` Class

In [7]:
class NeuralNetwork():
    
    def __init__(self, n_inodes, n_hnodes, n_onodes, learn_rate):
        ## these determine the number of nodes per layer
        self.i_cnt = n_inodes
        self.h_cnt = n_hnodes
        self.o_cnt = n_onodes

        ## specify the learning rate
        self.lr = learn_rate


        ## weight initialization
        ## this can be done in one of two ways;
        ## 1. we can randomly do so, then shift by 0.5 to 0-center our weights to introduce some negativity
        ## 2. we can pull from a normal distribution based on some rather well established rationales

        ## going with option 1:
        self.w_i2h = np.random.rand(self.h_cnt, self.i_cnt) - 0.5
        self.w_h2o = np.random.rand(self.o_cnt, self.h_cnt) - 0.5

        ## going with option 2:
        self.w_i2h = np.random.normal(0, pow(self.h_cnt, -0.5), (self.h_cnt, self.i_cnt))
        self.w_h2o = np.random.normal(0, pow(self.o_cnt, -0.5), (self.o_cnt, self.h_cnt))


        ## we can now specify the activation function, we'll do so as a lambda function
        self.activation = lambda x: expit(x)
    
    def train(self, input_list, target_list):
        ## convert input list to np.array and transpose because of matrix mult
        inputs  = np.array(input_list, ndmin=2).T
        targets = np.array(target_list, ndmin=2).T


        hidden_in  = np.dot(self.w_i2h, inputs)
        hidden_out = self.activation(hidden_in)

        output_in  = np.dot(self.w_h2o, hidden_out)
        output_out = self.activation(output_in)

        output_err = (targets - output_out)

        hidden_err = np.dot(self.w_h2o.T, output_err)

        self.w_h2o += self.lr * np.dot((output_err * output_out * (1 - output_out)), np.transpose(hidden_out))
        self.w_i2h += self.lr * np.dot((hidden_err * hidden_out * (1 - hidden_out)), np.transpose(inputs))
        
    def query(self, input_list):
        ## convert input list to np.array and transpose because of matrix mult
        inputs = np.array(input_list, ndmin=2).T


        ## propagate the input through the hidden layer
        ### recall at X_{hidden} = W_{input_hidden} * I_{inputs}
        hidden_in  = np.dot(self.w_i2h, inputs)
        ### pass `hidden_in` through the activation function to calculate the output
        hidden_out = self.activation(hidden_in)


        ## propagate the hidden output through the ouput layer
        ### recall at X_{output} = W_{hidden_output} * I_{hidden_out}
        output_in  = np.dot(self.w_h2o, hidden_out)
        ### pass `output_in` through the activation function to calculate the output
        output_out = self.activation(output_in)

        return output_out

---

### Training the Network on MNIST

Now we'll move on the training the network on MNIST, but to do so, we need to specify some parameters.

**Recall** that these images are `28x28` pixel images, which results in a total of `784` inputs. We ultimately need to classify these images into `10` classes, as we're analyzing the numbers `0-9`. The hidden layers is rather arbitrary in size, so we can use just about any amount of hidden layers we want.

In [8]:
n_inodes = 1
n_hnodes = 1
n_onodes = 1

learn_rt = 1

nn = NeuralNetwork(n_inodes, n_hnodes, n_onodes, learn_rt)

### BEGIN SOLUTION

n_inodes = 784
n_hnodes = 200
n_onodes =  10

learn_rt = 0.1

nn = NeuralNetwork(n_inodes, n_hnodes, n_onodes, learn_rt)

### END SOLUTION

We've initialized the ANN, so now we need to actually execute the training of it. We'll train over `N` *epochs*, which are essentially just the number of times we go over the data to see if we can continue to refine the weights.

In [26]:
epochs = 5

for e in range(epochs):
    for record, label in zip(mnist.train.images, mnist.train.labels):
        inputs = record * 0.99 + 0.01
        targets = label * 0.98 + 0.01
        nn.train(inputs, targets)

### Scoring the Network

We've built and trained the ANN, now; so the next step we should take is to test our accuracy to see if our model has actually learned, well, from the data we've given it.

In [10]:
test_file = open("data/mnist_test.csv", "r")
test_list = test_file.readlines()
test_file.close()

In [33]:
score = []

for record, label in zip(mnist.test.images, mnist.test.labels):
    correct_label = np.argmax(label)
    inputs = record * 0.99 + 0.01

    outputs = nn.query(inputs)
    label = np.argmax(outputs)
    
    score.append(1 if label == correct_label else 0)

In [44]:
print("Performance = {0:.3f}%".format(np.array(score).mean() * 100))

Performance = 97.550%


Now we've built our first ANN. This is a pretty small one compared to some that exist in the depths of the interwebs, but ultimately it's a start.

This network's accuracy is about 97%, which is pretty bad for MNIST, but for your first network, that's pretty awesome.

---

## 3. Rebuilding the ANN in TensorFlow

Now let's build the same sort of ANN, but this time in [TensorFlow](https://www.tensorflow.org/). Ultimately, you can build your own networks and models; but one of the benefits of using a platform like TensorFlow is that it enables you to use other's models, as well as allow others to use your own.

In [13]:
import tensorflow as tf

In [14]:
x = tf.placeholder(tf.float32, [None, 784])

In [15]:
W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))

In [16]:
y = tf.nn.softmax(tf.matmul(x, W) + b)

In [17]:
y_ = tf.placeholder(tf.float32, [None, 10])

In [18]:
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1]))

In [19]:
train_step = tf.train.GradientDescentOptimizer(0.1).minimize(cross_entropy)

In [20]:
sess = tf.InteractiveSession()

In [21]:
tf.global_variables_initializer().run()

In [38]:
for _ in range(10000):
    batch_xs, batch_ys = mnist.train.next_batch(100)
    sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})

In [39]:
correct_pred = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))

In [40]:
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))

In [45]:
print("Performance = {0:.3f}%".format(sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels}) * 100))

Performance = 92.230%
