# Neural Network Playground

<img src="http://nicolamanzini.com/wp-content/uploads/2017/11/single_hidden_layer.jpg" />

Neural networks have become increasingly important in almost all fields today. To help you get a better understanding of how different parameters can affect a neural network, we have created a few interactive exercises you can try below. 

### Packages

The neural network we are going to use is implemented using Tensorflow, a very popular machine learning library.

In [0]:
import tensorflow as tf
import matplotlib.pyplot as plt
import numpy as np
from IPython.display import clear_output
from tensorflow.examples.tutorials.mnist import input_data

### Data

The data we are going to be training on is the MNIST handwritten digits dataset, a popular dataset for educational purposes. This dataset contains grayscale images of handwritten digits ranging from 0-9. Each image is 28x28. Along with each image is a one-hot encoded label.

In [0]:
mnist = input_data.read_data_sets('MNIST_data', one_hot=True)
batch = mnist.train.next_batch(1000)

Let's take a look at some images and their corresponding labels. Use the slider to look at any of 50 sample images.

In [0]:
#@title Looking at the Data { run: "auto", vertical-output: true }
image = 0 #@param {type:"slider", min:0, max:50, step:1}
image *= image

plt.figure(figsize = (14, 4))
plt.subplot(1,3,1)
plt.imshow(mnist.train.images[image].reshape(28,28), cmap='gray')
plt.title(np.argmax(mnist.train.labels[image]), size='xx-large')
plt.show()

print("\nLabel: " + str(mnist.train.labels[image]))

Go ahead and run the cell below to set up the model we will be using.

In [0]:
def model(activation, num_units, num_it, lr, init):
    activations = {"sigmoid": tf.nn.sigmoid, "tanh": tf.nn.tanh, "ReLU": tf.nn.relu}
    initialize = tf.glorot_normal_initializer() if init else tf.zeros_initializer()
    x = tf.placeholder(tf.float32, shape=[None, 784])
    y = tf.placeholder(tf.float32, shape=[None, 10])

    hidden_layer = tf.layers.dense(x, num_units, activation=activations[activation], kernel_initializer=initialize)
    output_layer = tf.layers.dense(hidden_layer, 10, kernel_initializer=initialize)

    cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y, logits=output_layer))

    optimizer = tf.train.RMSPropOptimizer(lr).minimize(cost)
    init = tf.global_variables_initializer()


    with tf.Session() as sess:
        sess.run(init)
        for step in range(num_it):
            _, val = sess.run([optimizer, cost], feed_dict={x: batch[0], y: batch[1]})
            if num_it < 10 or step % int(num_it / 10) == 0:
                print("cost at iteration: {}: {}".format(step, val))

        correct_pred = tf.equal(tf.argmax(output_layer,1), tf.argmax(y,1))
        accuracy = tf.reduce_mean(tf.cast(tf.equal(tf.argmax(output_layer, axis=1), tf.argmax(y, axis=1)), tf.float32))
        print(("\nAccuracy: {:.2f}").format(accuracy.eval(feed_dict={x: mnist.train.images, y: mnist.train.labels}) * 100) + "%")

## Activation Functions

Choosing a good activation function can help speed up training and make our model more accurate. The three functions we are going to look at are sigmoid, tanh and ReLU. Their equations and graphs are below.

#### $sigmoid(z) = \frac{1}{1+e^{-z}}$

<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/8/88/Logistic-curve.svg/1200px-Logistic-curve.svg.png" height="150px"/>

### $tanh(z) = \frac{e^z-e^{-z}}{e^z+e^{-z}}$

<img src="http://reference.wolfram.com/language/ref/Files/Tanh.en/O_2.png" height="140px" />

#### $ReLU(z)=max( 0, z)$

<img src="https://i.imgur.com/gKA4kA9.jpg" height="150px" />

Select one of the activation functions below and the model will automatically train!

**Some Things to Note**
- How does each activation affect the accuracy?
- Is there a clear distinction in performance between them?

In [0]:
#@title { run: "auto", vertical-output: true }
clear_output()
activation = "sigmoid" #@param ["sigmoid", "tanh", "ReLU"]

model(activation, 100, 100, .001, 1)


## Number of Hidden Units (neurons)

In our hidden layer, we can have a variable number of neurons. Each neuron is essentially just a logistic regression unit. The more units we add the more information we can store, but this also means we will need to do more computation. Tinker with different numbers of units (1-1000) and see how it affects our model.

**Some Things to Note**
- How does the number of units affect the accuracy of the model?
    - Is it a linear relationship?
- How does changing the number of units affect the time our model takes to train?
- What is a possible problem you could see happening if we have too many units?
    - Hint: think about what happens when a model fits the data too closely.


In [0]:
#@title  { run: "auto", vertical-output: true }
clear_output()
num_units = 1 #@param {type:"slider", min:1, max:1000, step:1}

model("ReLU", num_units, 100, .001, 1)

## Initializing the Weights

With logistic regression, we just initialized our weight matrix to 0's. With neural networks, though, if we intialized all of our weights to 0, then every single hidden unit would get updated the exact same way. It would be the equivalent of doing just logistic regression. Instead, we want to randomly set our weights to some small number around 0. Use the dropdown below to see the difference random intialization can make. 

In [0]:
#@title  { run: "auto", vertical-output: true }
intialize_weights = "randomly" #@param ["randomly", "zeros"]

if intialize_weights == "randomly":
    model("tanh", 100, num_it, .001, 1)
else:
    model("tanh", 100, num_it, .001, 0)

## Learning Rate

The learning rate is some small number we use to scale our derivatives when updating our weights and biases during gradient descent. Choosing a good learning rate can have a big impact on how our model trains. If our learning rate is too small, our model can take too long to optmize. If it is too big then we might keep overshooting our optimal weights.

<img src="https://cdn-images-1.medium.com/max/1600/0*QwE8M4MupSdqA3M4.png" height="200px"/>


Use the dropdown below to try out a few different learning rates.

**Some things to note**
- How does the size of the learning rate affect the accuracy of the model?
    - Is it a linear relationship?
- How does it affect how quickly our model trains?



In [0]:
#@title { run: "auto", vertical-output: true }
clear_output()
learning_rate = 0.00001 #@param ["0.00001", "0.0001", "0.001", "0.01", "0.1", "1", "10"] {type:"raw"}

model("ReLU", 100, 100, learning_rate, 1)

## Number of Training Iterations

The number of training iterations is the number of time we will do forward and backpropogation. Use the slider to try out different numbers of iterations.

**Some things to note**
- How does the number of iterations affect the accuracy of the model?
    - Is it a linear realtionship?
- How does it affect the training time?
- If we did too many iterations what could you forsee happening?
    - Hint: What happens when our model gets too good at predicting outputs for our training data?


In [0]:
#@title  { run: "auto", vertical-output: true }
clear_output()
num_it = 1 #@param {type:"slider", min:1, max:1000, step:1}

model("ReLU", 100, num_it, .001, 1)