# Neural Networks

Hopefully you've watched the three videos by [Grant Sanderson](https://twitter.com/3blue1brown) (a.k.a. [3blue1brown](https://www.youtube.com/channel/UCYO_jab_esuFRV4b17AJtAw)).

* [But what is a Neural Network?](https://www.youtube.com/watch?v=aircAruvnKk) (19:13)
* [Gradient descent, how neural networks learn](https://www.youtube.com/watch?v=IHZwWFHWa-w) (21:00)
* [What is back propagation really doing?](https://www.youtube.com/watch?v=Ilg3gGewQ5U) (13:53)

---

## A very brief recap

Neurons:

* Hold a value
* This value is related to the values of neurons on previous layers via:
    * weights
    * bias
    * activation function

Neural network structure:

<img src="https://upload.wikimedia.org/wikipedia/commons/1/1d/Neural_network_example.png"  style="width:200px;">

* Input layer
* one or more hidden layers (this is where the term "deep" comes from)
* an output layer

Learning:

* minimizing a loss function through back propagation

<img src="https://pbs.twimg.com/media/EybMJzOU8AY8g7M?format=png&name=small"  style="width:400px;">

I will be adapting a lot of this material from the SciNet workshop on neural networks:

https://support.scinet.utoronto.ca/education/go.php/451/index.php/ib/1//p_course/451

Their course goes a lot deeper into the mathematics of neural networks.

In [None]:
!pip install keras
!pip install tensorflow

In [None]:
import keras
from keras.datasets import mnist

In [None]:
(x_train, y_train), (x_test, y_test) = mnist.load_data()

In [None]:
x_train.shape

In [None]:
x_test.shape

In [None]:
x_train[0]

In [None]:
import PIL
PIL.Image.fromarray(x_train[0])

In [None]:
y_train[0]

Transform the data

In [None]:
import keras.utils as ku

x_train2 = x_train[0:500, :, :].reshape(500, 784)
x_test2 = x_test[0:100, :, :].reshape(100, 784)
y_train2 = ku.to_categorical(y_train[0:500], 10)
y_test2 = ku.to_categorical(y_test[0:100], 10)

In [None]:
x_train2.shape

In [None]:
x_train2[0]

In [None]:
# Array of 28x28 inputs
print(x_train[0][14][13])

# Array of 784 inputs
print(x_train2[0][14*28+13])

In [None]:
y_train2[0]

In [None]:
y_train2[0].argmax()

# TODO: What does sequential mean

In [None]:
import keras.models as km
import keras.layers as kl

def get_basic_model(numnodes):
  model = km.Sequential()
  model.add(kl.Dense(numnodes, input_dim = 784, activation = 'sigmoid', name = 'hidden'))
  model.add(kl.Dense(10, name = 'output', activation = 'sigmoid'))
  return model

In [None]:
model = get_basic_model(30)

In [None]:
model.summary()

# TODO: what does this do exactly?

In [None]:
model.compile(optimizer='sgd',
              metrics=['accuracy'],
              loss="mean_squared_error")

Compiling prepares the model for training

SGD = Stochastic Gradient Descent (more later)
loss is our loss function (as described in video)
accuracy is reported during training

# TODO: what is an epoch exactly?

Gradient Descent is a slow process, so one speed up is to send the data to the algorithms in random batches until all of the data is read. Each time this happens, it's called an 'epoch'.

In [None]:
fit = model.fit(x_train2,
                y_train2,
                epochs=1000,
                batch_size=5,
                verbose=2)

In [None]:
score = model.evaluate(x_test2, y_test2)

### Question

What do you call it when your model works great on the training data, but doesn't work so well on unseen data?

## Regularization

Regularization is a method we can use to tackle overfitting.

To quote the SciNet neural networks workshop:

"Regularization is an ad hoc technique by which parameters in a model are penalized to prevent
individual parameters from becoming excessively important to the fit."

This technique involves a modification to the cost function our training uses to treat (the extent to which high parameters are penalized is controlled by a parameter lambda ($\lambda$). (Note that we can't call the parameter `lambda` below, because `lambda` is a reserved keywork in python, so we call in `lam`.)

In [None]:
import keras.models as km
import keras.layers as kl
import keras.regularizers as kr

def get_regularized_model(numnodes, lam=0.0):
  model = km.Sequential()
  model.add(kl.Dense(numnodes, input_dim = 784, activation = 'sigmoid', name = 'hidden', kernel_regularizer = kr.l2(lam)))
  model.add(kl.Dense(10, name = 'output', activation = 'sigmoid',kernel_regularizer = kr.l2(lam)))
  return model

In [None]:
model2 = get_regularized_model(30, lam = 0.001)

model2.compile(optimizer = 'sgd', metrics = ['accuracy'], loss = "mean_squared_error")

%time fit2 = model2.fit(x_train2, y_train2, epochs = 1000, batch_size = 5, verbose = 2)

In [None]:
model2.evaluate(x_test2, y_test2)

In [None]:
import os
from PIL import Image, ImageOps

In [None]:
pwd

https://drawisland.com/?w=200&h=200

In [None]:
image = Image.open('/tmp/drawisland3.png').resize( (28,28) ).convert( 'L' )
image = ImageOps.invert(image)
image

In [None]:
import numpy as np

image_array = np.array(image)[:,:].reshape(784)

model.predict(np.array([image_array])).argmax()

In [None]:
def get_model_more(numnodes):
  model = km.Sequential()
  model.add(kl.Dense(numnodes, input_dim = 784, activation = 'sigmoid', name = 'hidden'))
  model.add(kl.Dense(numnodes, input_dim = numnodes, activation = 'sigmoid', name = 'hidden2'))
  model.add(kl.Dense(10, name = 'output', activation = 'sigmoid'))
  return model

In [None]:
model3 = get_model_more(16)

In [None]:
model3.compile(optimizer = 'sgd', metrics = ['accuracy'], loss = "mean_squared_error")

%time fit3 = model3.fit(x_train2, y_train2, epochs = 1000, batch_size = 5, verbose = 2)

In [None]:
import keras
from keras.datasets import mnist
import keras.utils as ku
import keras.models as km
import keras.layers as kl
import keras.regularizers as kr

import numpy as np
import PIL

def get_data(num_training, num_testing):
    (x_train, y_train), (x_test, y_test) = mnist.load_data()

    x_train2 = x_train[0:num_training, :, :].reshape(num_training, 784)
    x_test2 = x_test[0:num_testing, :, :].reshape(num_testing, 784)
    y_train2 = ku.to_categorical(y_train[0:num_training], 10)
    y_test2 = ku.to_categorical(y_test[0:num_testing], 10)
    
    return (x_train2, y_train2), (x_test2, y_test2)

def get_model(num_nodes, num_hidden_layers = 1, lam=None):
    model = km.Sequential()
    regularizer = None
    if lam:
        regularizer = kr.l2(lam)

    # First hidden layer
    model.add(kl.Dense(num_nodes,
                       input_dim=784,
                       activation='sigmoid',
                       name ='hidden',
                       kernel_regularizer=regularizer))

    # Additional hidden layers
    hidden_layers = num_hidden_layers - 1
    while (hidden_layers > 0):
        model.add(kl.Dense(num_nodes,
                           input_dim=num_nodes,
                           activation='sigmoid',
                           kernel_regularizer=regularizer))
        hidden_layers -= 1

    # Output layer
    model.add(kl.Dense(10,
                       name='output',
                       activation='sigmoid',
                       kernel_regularizer=regularizer))
    
    model.compile(optimizer ='sgd',
                  metrics =['accuracy'],
                  loss="mean_squared_error")

    return model

def train_model(model, x_train, y_train, epochs=1000):
    fit = model.fit(x_train,
                    y_train,
                    epochs=epochs,
                    batch_size=5,
                    verbose=2)
    
def evaluate_model(model, x_test, y_test):
    model.evaluate(x_test, y_test)

def model_predict(filename):
    image = PIL.Image.open(filename).resize( (28,28) ).convert( 'L' )
    image = PIL.ImageOps.invert(image)
    print("Image:")
    display(image)
    image_array = np.array(image)[:,:].reshape(784)
    prediction = model.predict(np.array([image_array])).argmax()
    print("Prediction: {}".format(prediction))

In [None]:
NUM_TRAINING = 1000 # 60000 max
NUM_TESTING = 200 # 10000 max

NUM_NODES = 30
NUM_HIDDEN_LAYERS = 1

(x_train, y_train), (x_test, y_test) = get_data(num_training=NUM_TRAINING,
                                                num_testing=NUM_TESTING)
model = get_model(num_nodes=NUM_NODES,
                  num_hidden_layers=NUM_HIDDEN_LAYERS,
                  lam=0.01)
%time train_model(model, x_train, y_train)
evaluate_model(model, x_test, y_test)

In [None]:
import glob
for filename in glob.glob('data/numbers/*.png'):
    model_predict(filename)

In [None]:
# Download data and solutions

import urllib.request
import os

def download_data(path, branch='main'):
    base_url = 'https://raw.githubusercontent.com/ualberta-rcg/python-machine-learning'
    if os.path.exists(path):
        return
    if not os.path.exists('data'):
        os.mkdir('data')
    if not os.path.exists('data/numbers'):
        os.mkdir('data/numbers')
    url = '{}/{}/notebooks/{}'.format(base_url, branch, path)
    output_file = path
    urllib.request.urlretrieve(url, output_file)
    print("Downloaded " + path)

download_data('data/numbers/cwant_1.png')
download_data('data/numbers/cwant_3.png')
download_data('data/numbers/cwant_5.png')
download_data('data/numbers/cwant_8.png')

In [None]:
model.layers

In [None]:
import pandas


In [None]:
pandas.Series

In [None]:
pandas.Series(y_train)

In [None]:
y_train

In [None]:
y_values = [v.argmax() for v in y_train]

In [None]:
pandas.Series(y_values).value_counts()

In [None]:
from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())


In [None]:
import tensorflow as tf
print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))