[View in Colaboratory](https://colab.research.google.com/github/todun/googlecolab/blob/master/mnist_tensorflow_tutorial.ipynb)

# Introduction
## May the _tensor_ flow with you...

After team [*Wakanda*](https://en.wikipedia.org/wiki/Wakanda_(comics%29), I mean [Team Torch Panther](https://medium.com/ai-saturdays/aisaturdaylagos-the-torch-panther-cdec328c125b) or was it [team pytorch](pytorch.org) razzle-dazzled the [AI Saturday Lagos community](https://medium.com/ai-saturdays) with its pythonic and dynamic neural networking ways, it was time for Tensors to take center stage using high performant flows due to both tensorflow's C++ backend and lazy evaluation from its computational graph execution model.

![Yoda Tensor Flows](https://memegenerator.net/img/instances/81504156/tensors-flows-from-node-to-node-they-do.jpg)

To learn more of how you can flow with the tensors, keep reading to be one with the Tensorflow.

# Objectives

- This Jupyter notebook would walk through how to classify handwritten digits using the [opensource Google Brain's](https://github.com/tensorflow) [tensorflow](https://www.tensorflow.org) python library.

- Further, only the low level tensorflow API would be used to demonstrate how to recognize images of digits(0 through 9) with over 99% accuracy.

# Install tensorflow
>> Setup shown for [Unix](https://en.wikipedia.org/wiki/Unix) / [Linux](https://en.wikipedia.org/wiki/Linux) / [OSX](https://en.wikipedia.org/wiki/MacOS)

This notebook would use the Google tensorflow python library. 
Pre-installation steps include:
- [installing anaconda](https://www.anaconda.com)
- installing tensorflow on your computer with anaconda: <code>conda install tensorflow</code>
    - Optionaly install a specific version of tensorflow <code>conda install tensorflow=1.0.0</code>

In [0]:
import tensorflow as tf
import numpy as np
!pip install --upgrade matplotlib
%matplotlib inline

Collecting matplotlib
  Downloading matplotlib-2.2.2-cp36-cp36m-manylinux1_x86_64.whl (12.6MB)
[K    100% |████████████████████████████████| 12.6MB 111kB/s 
[?25hRequirement already up-to-date: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /usr/local/lib/python3.6/dist-packages (from matplotlib)
Collecting pytz (from matplotlib)
  Downloading pytz-2018.3-py2.py3-none-any.whl (509kB)
[K    100% |████████████████████████████████| 512kB 2.5MB/s 
[?25hRequirement already up-to-date: six>=1.10 in /usr/local/lib/python3.6/dist-packages (from matplotlib)
Requirement already up-to-date: numpy>=1.7.1 in /usr/local/lib/python3.6/dist-packages (from matplotlib)
Collecting python-dateutil>=2.1 (from matplotlib)
  Downloading python_dateutil-2.7.0-py2.py3-none-any.whl (207kB)
[K    100% |████████████████████████████████| 215kB 5.0MB/s 
[?25hRequirement already up-to-date: cycler>=0.10 in /usr/local/lib/python3.6/dist-packages (from matplotlib)
Collecting kiwisolver>=1.0.1 (from matplotlib)
  Do

# Setup an interactive session
Used to yield results of the built [computational graph](https://www.tensorflow.org/programmers_guide/graphs)

In [0]:
sess = tf.InteractiveSession()

# Reading the [mnist](http://yann.lecun.com/exdb/mnist) data 
This is data from the Mixed National Institute of Standards and Technology representing digitized handwritten digits

In [0]:
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('MNIST_data', one_hot=True)
width = 28 # width of the image in pixels 
height = 28 # height of the image in pixels
flat = width * height # number of pixels in one image 
class_output = 10 # number of possible classifications for the problem
x  = tf.placeholder(tf.float32, shape=[None, flat])
y_ = tf.placeholder(tf.float32, shape=[None, class_output])

Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.
Extracting MNIST_data/train-images-idx3-ubyte.gz
Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz


# First convolution layer
- A convolutional layer is used to extract features of the image which are represented as [tensors](https://www.tensorflow.org/programmers_guide/tensors)
- The goal is to train weights and their biases that represents the learned important features of the images
- Weights and biases are represented as [variables](https://www.tensorflow.org/programmers_guide/variables) in tensorflow
- The actual convolution, [down sampling](https://www.tensorflow.org/tutorials/layers) the images, is done with [max_pool](https://www.tensorflow.org/api_docs/python/tf/nn/max_pool)
- After the learning is complete for the first layer, the layer is then activated with an activation function.
- Since deep learning mimics the brain, a function that models a neuronal activation/learning is used. The current scientific consensus is that is done in a [relu](https://www.tensorflow.org/api_docs/python/tf/nn/relu) like manner.

In [0]:
x_image = tf.reshape(x, [-1,28,28,1]) 
W_conv1 = tf.Variable(tf.truncated_normal([5, 5, 1, 32], stddev=0.1))
b_conv1 = tf.Variable(tf.constant(0.1, shape=[32])) # need 32 biases for 32 outputs
convolve1= tf.nn.conv2d(x_image, W_conv1, strides=[1, 1, 1, 1], padding='SAME') + b_conv1
h_conv1 = tf.nn.relu(convolve1)
conv1 = tf.nn.max_pool(h_conv1, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')

# Second convolution layer
- To make our neural network get better at learning the image features, a second convolutional layer is added to aid its extraction of higher level features.
- This is how deep learning gets its name. Additional layers make for a _deeper learning_ of the images.

In [0]:
W_conv2 = tf.Variable(tf.truncated_normal([5, 5, 32, 64], stddev=0.1))
b_conv2 = tf.Variable(tf.constant(0.1, shape=[64])) #need 64 biases for 64 outputs
convolve2= tf.nn.conv2d(conv1, W_conv2, strides=[1, 1, 1, 1], padding='SAME')+ b_conv2
h_conv2 = tf.nn.relu(convolve2)
conv2 = tf.nn.max_pool(h_conv2, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME') #max_pool_2x2

# Fully connected layer
A fully connected layer is used to integrate all the learnings of the network.

In [0]:
layer2_matrix = tf.reshape(conv2, [-1, 7*7*64])
W_fc1 = tf.Variable(tf.truncated_normal([7 * 7 * 64, 1024], stddev=0.1))
b_fc1 = tf.Variable(tf.constant(0.1, shape=[1024])) # need 1024 biases for 1024 outputs
fcl=tf.matmul(layer2_matrix, W_fc1) + b_fc1
h_fc1 = tf.nn.relu(fcl)

# Optional dropout layer 
- In other to prevent overfitting (.i.e. over learning), dropout is used 
- The predictions are done using the softmax function to classify the digits being feed as input images. This is actually done on the fully connected layer

In [0]:
keep_prob = tf.placeholder(tf.float32)
layer_drop = tf.nn.dropout(h_fc1, keep_prob)
W_fc2 = tf.Variable(tf.truncated_normal([1024, 10], stddev=0.1)) #1024 neurons
b_fc2 = tf.Variable(tf.constant(0.1, shape=[10])) # 10 possibilities for digits [0,1,2,3,4,5,6,7,8,9]
fc=tf.matmul(layer_drop, W_fc2) + b_fc2
y_CNN= tf.nn.softmax(fc)

---------------

>>NOTE: The dropout layer marks the end of our computational graph

# Defining Loss Function and training the model
To know how well our network is doing, we use the cross_entropy function to measure our loss between trained and actual/test(never seen by our neural network before) input data.

In [0]:
layer4_test =[[0.9, 0.1, 0.1],[0.9, 0.1, 0.1]]
y_test=[[1.0, 0.0, 0.0],[1.0, 0.0, 0.0]]
np.mean( -np.sum(y_test * np.log(layer4_test),1))
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y_CNN), reduction_indices=[1]))
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(y_CNN,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
sess.run(tf.global_variables_initializer())

num_iterations = 20000
batch_size = 50
for i in range(num_iterations):
    batch = mnist.train.next_batch(batch_size)
    if i%100 == 0:
        train_accuracy = accuracy.eval(feed_dict={
            x:batch[0], y_: batch[1], keep_prob: 1.0})
        print("step %d, training accuracy %g"%(i, train_accuracy))
    train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})
	
print("test accuracy %g"%accuracy.eval(feed_dict={x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0}))

step 0, training accuracy 0.08
step 100, training accuracy 0.88
step 200, training accuracy 0.94
step 300, training accuracy 0.88
step 400, training accuracy 0.88
step 500, training accuracy 0.96
step 600, training accuracy 0.9
step 700, training accuracy 0.92
step 800, training accuracy 0.96
step 900, training accuracy 0.98
step 1000, training accuracy 0.94
step 1100, training accuracy 0.98
step 1200, training accuracy 0.98
step 1300, training accuracy 0.96
step 1400, training accuracy 0.96
step 1500, training accuracy 0.94
step 1600, training accuracy 0.98
step 1700, training accuracy 0.88
step 1800, training accuracy 0.96
step 1900, training accuracy 1
step 2000, training accuracy 1
step 2100, training accuracy 0.98
step 2200, training accuracy 0.98
step 2300, training accuracy 0.98
step 2400, training accuracy 0.98
step 2500, training accuracy 0.98
step 2600, training accuracy 0.98
step 2700, training accuracy 0.94
step 2800, training accuracy 0.98
step 2900, training accuracy 0.98

step 7700, training accuracy 0.98
step 7800, training accuracy 1
step 7900, training accuracy 1
step 8000, training accuracy 1
step 8100, training accuracy 0.96
step 8200, training accuracy 1
step 8300, training accuracy 0.98
step 8400, training accuracy 1
step 8500, training accuracy 0.96
step 8600, training accuracy 0.98
step 8700, training accuracy 1
step 8800, training accuracy 1
step 8900, training accuracy 1
step 9000, training accuracy 1
step 9100, training accuracy 1
step 9200, training accuracy 1
step 9300, training accuracy 1
step 9400, training accuracy 1
step 9500, training accuracy 1
step 9600, training accuracy 1
step 9700, training accuracy 1
step 9800, training accuracy 1
step 9900, training accuracy 1
step 10000, training accuracy 0.98
step 10100, training accuracy 1
step 10200, training accuracy 1
step 10300, training accuracy 1
step 10400, training accuracy 1
step 10500, training accuracy 1
step 10600, training accuracy 1
step 10700, training accuracy 0.98
step 10800

step 15500, training accuracy 1
step 15600, training accuracy 1
step 15700, training accuracy 1
step 15800, training accuracy 1
step 15900, training accuracy 1
step 16000, training accuracy 1
step 16100, training accuracy 1
step 16200, training accuracy 1
step 16300, training accuracy 0.98
step 16400, training accuracy 1
step 16500, training accuracy 1
step 16600, training accuracy 1
step 16700, training accuracy 1
step 16800, training accuracy 0.98
step 16900, training accuracy 1
step 17000, training accuracy 1
step 17100, training accuracy 1
step 17200, training accuracy 1
step 17300, training accuracy 0.98
step 17400, training accuracy 1
step 17500, training accuracy 1
step 17600, training accuracy 1
step 17700, training accuracy 0.98
step 17800, training accuracy 1
step 17900, training accuracy 1
step 18000, training accuracy 1
step 18100, training accuracy 1
step 18200, training accuracy 1
step 18300, training accuracy 1
step 18400, training accuracy 1
step 18500, training accurac

------------------

# Visualisation

# Visualisation

First get the tile_raster_images [utility](http://deeplearning.net/tutorial/code/utils.py) for drawing raster images from [deeplearning](http://deeplearning.net)

In [0]:
!wget --no-clobber --quiet http://deeplearning.net/tutorial/code/utils.py

import utils
from utils import tile_raster_images
import matplotlib.pyplot as plt
from PIL import Image

[Kernels](https://en.wikipedia.org/wiki/Kernel_(image_processing%29) in image processing is a small matrix used for blurring, sharpening, embossing, edge detection, and more. This is accomplished by doing a convolution between a kernel and an image.

In [0]:
kernels = sess.run(tf.reshape(tf.transpose(W_conv1, perm=[2, 3, 0,1]),[32,-1]))

Then plot the training accuracy

In [0]:
image = Image.fromarray(tile_raster_images(kernels, img_shape=(5, 5) ,tile_shape=(4, 8), tile_spacing=(1, 1)))
plt.rcParams['figure.figsize'] = (18.0, 18.0)
# FIXME: AttributeError: 'numpy.ndarray' object has no attribute 'mask' 
# imgplot = plt.imshow(image)
# imgplot.set_cmap('gray')  

-------------------------------------------------------------------

# closing the session

**After** the computational graph yields all its values, the session is then closed

In [0]:
sess.close() #finish the session

# Challenges
Our experience with Tensorflow was overrall positive. Some challenges and guides to overcome them include

- Having exceptions dealing with misplaced shape
<code>ValueError: Cannot feed value of shape (100, 784) for Tensor 'convolutional/X:0', which has shape '(?, 28, 28, 1)'</code>

This challenge becomes more and more apparent the deeper the network. Having a deeper network (.e.g. [GoogleNet](https://www.cs.unc.edu/~wliu/papers/GoogLeNet.pdf) ) with more parameters and layers for example can lead to situations where the shapes don't match. This is excarcebated by the fact that the final result is not known until after the static computational graph is run.

Building the computational graph step by step from small one layer onwards and testing continuously helped avoid this issue.

- Having a non-dynamic to see results of the neural network

The very speed of tensorflow comes from its lazy evaluation of a computational graph. However, when we were building the neural network and wanted to having branching logic based on various criteria(e.e.g loss, different layer performance .etc.), it was not straightforward.

Tensorflow now has an API to do [control flow](https://www.tensorflow.org/api_guides/python/control_flow_ops) but it ends up not being pythonic and adds an additional layer of complexity to the neural network logic.

# Team members
Thanks to 
1. Ejiro, an engineering student at UNILAG for putting together the presentation.
2. Tella Babatunde for working on neural network
3. Ibrahim Gbadegeshin for the A+ presentation
4. Juwe C. Raphael, Tunde Osborne for making it possible
5. Yours truly, [Todun](https://www.linkedin.com/in/todun), with the [jupyter notebook](https://github.com/todun/deep-frameworks-explore.git) for fielding questions and aggregating our effort

# References
- https://github.com/todun/deep-frameworks-explore.git
- http://bit.ly/aisaturday_tensorflow
