# 1. Packages and Data

## 1.1 Packages

In [1]:
import tensorflow as tf
# tf.enable_eager_execution() # Allows us to not need to initalize variables (more below)

In [2]:
import pandas as pd
import numpy as np

# Graphing
import matplotlib.pyplot as plt

## 1.2 Data

Will be introduced as needed below

# 2. Tensorflow basics

Source:

* https://tf.wiki/

## 2.1 Installation

There are different versions of tensorflow, one that is for CPU execution and one for GPU usage. More details are provided in the wiki above, for simplicity we will just use the 'vanialla' cpu option to avoid difficulties with installation. 

The GPU options is definitely worth looking into if you have the local hardware and expect to do some heavy deep learning work.

 `pip install tensorflow` will do the cpu version
 
`pip install tensorflow-gpu` will do the gpu version

## 2.2 Set up variables

It is advisable to work through the tensorflow documentation on variables. A brief summary and some code is below.

* https://www.tensorflow.org/guide/variables

Some important commands to get to know:

`tf.get_variable()` a useful function for creating variables. You can give it an initialiser and other commands. (https://www.tensorflow.org/api_docs/python/tf/get_variable)

`tf.constant` creates a tensorflow tensor. (https://www.tensorflow.org/api_docs/python/tf/constant)

`tf.Variable` is a constructor 

In [None]:
# You can also generate data
one_eg = tf.ones(shape=(3,3))
random_eg = tf.random_uniform([3,3])

In [None]:
# Take a look
one_eg

In [None]:
# Take a look
random_eg

There are a variety of functions to utilise your variables. A popular need being matrix multiplication:

`tf.matmul` for matrix multiplication. This takes in two arguments and performs the multiplication.

In [None]:
A = tf.constant([[1, 2], [3, 4]])
B = tf.constant([[5, 6], [7, 8]])
C = tf.matmul(A, B)

print(C)

# Can confirm on https://matrix.reshish.com/multCalculation.php

## 2.2 Create a computation graph

We could therefore set up the computation graph from our lecture notes

<img src="https://s3-ap-southeast-2.amazonaws.com/mdsi-deep-learn-aut-19/comp_graph.png" width="250" height="250"/>
<style>
 img {
    vertical-align: middle;
}
</style>

We will use the tensorflow GradientTape API to assist our small computation graph example. From the documentation:

_TensorFlow provides the tf.GradientTape API for automatic differentiation - computing the gradient of a computation with respect to its input variables. Tensorflow "records" all operations executed inside the context of a tf.GradientTape onto a "tape". Tensorflow then uses that tape and the gradients associated with each recorded operation to compute the gradients of a "recorded" computation using reverse mode differentiation._

In [4]:
x = tf.Variable(3.0, name="x")
y = tf.Variable(2.0, name="y")
z = tf.Variable(4.0, name="z")

with tf.GradientTape() as t:
    t.watch(x)
    
    # Our next layer
    a = tf.add(x,y, name="a")
    b = tf.multiply(a, z, name="b")
    f = tf.square(b, name="f")

SystemError: <built-in function TFE_Py_TapeWatch> returned a result with an error set

<img src="https://s3-ap-southeast-2.amazonaws.com/mdsi-deep-learn-aut-19/com_graph_diff.png" width="450" height="450"/>
<style>
 img {
    vertical-align: middle;
}
</style>

From our slide above, if we set the values of x=3, y=2, z=4 we expect F to be ((3+2)x4)^2. See our differentiation step if we wanted to differentiate this computation graph with respect to A.

We expect the value to be 2BZ which is 2((3+2)x4)x4 = 160

In [3]:
# We could also just ask for b back to simplify the above:
b

NameError: name 'b' is not defined

In [None]:
# Now we can differentiate 
# See more here https://www.tensorflow.org/tutorials/eager/automatic_differentiation
df_da = t.gradient(f, a)
df_da

## 2.3 Initialise variables

Before we dive in, a note on 'Ops':

* https://stackoverflow.com/questions/43290373/what-is-tensorflow-op-does

_TensorFlow is a programming system in which you represent computations as graphs. Nodes in the graph are called ops (short for operations). An op takes zero or more Tensors, performs some computation, and produces zero or more Tensors._

From the tensorflow documentation:

_Before you can use a variable, it must be initialized. If you are programming in the low-level TensorFlow API (that is, you are explicitly creating your own graphs and sessions), you must explicitly initialize the variables. Most high-level frameworks such as `tf.contrib.slim`, `tf.estimator.Estimator` and `Keras` automatically initialize variables for you before training a model_

The convenience function `tf.global_variables_initializer()` is a handy way to ad an Op that will initialise your global variables.

A `Session` object is an encapsulation of the environment where the `Ops` can take place using the `Tensor` objects. We need sessions 

At this stage, restart the kernel and don't have `enable_eager_execution`

In [5]:
x = tf.Variable(3.0, name="x")
y = tf.Variable(2.0, name="y")
z = tf.Variable(4.0, name="z")

# Our next layer
a = tf.add(x,y, name="a")
b = tf.multiply(a, z, name="b")
f = tf.square(b, name="f")

b

<tf.Tensor 'b:0' shape=() dtype=float32>

In [6]:
# Add an Op to initialize global variables.

init_op = tf.global_variables_initializer()

# Now we create a session
with tf.Session() as sess:
     
    # initialize the variables
    sess.run(init_op)
     
    # run the operation
    output = sess.run(f)
  
    print("Value of the equation is : {}".format(output))
    sess.close()

Value of the equation is : 400.0


# 3. A practical example

Adapted from:
* https://appliedmachinelearning.blog/2018/12/26/tensorflow-tutorial-from-scratch-building-a-deep-learning-model-on-fashion-mnist-dataset-part-1/    

In [7]:
import tensorflow as tf
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [8]:
import os
import pickle
data_dir = os.getcwd() + "/new_data"

In [9]:
def read_file(filename):
    with open(filename, 'rb') as file_in:
        dataset = pickle.load(file_in)
        file_in.close()
    return dataset

In [10]:
def read_files():
    global X_train, X_test, X_val, y_train, y_test, y_val
    file_list = ["X_train.pkl", "X_test.pkl", 
                 "y_train.pkl", "y_test.pkl",
                "X_val.pkl", "y_val.pkl"]
    file_list = [data_dir + "/" + x for x in file_list]

    X_train = read_file(file_list[0])
    X_test = read_file(file_list[1])
    y_train = read_file(file_list[2])
    y_test = read_file(file_list[3])
    
    X_val = read_file(file_list[4])
    y_val = read_file(file_list[5])

In [11]:
read_files()

In [12]:
#Confirm it worked
print(X_train.shape,X_test.shape, X_val.shape, y_train.shape, y_test.shape, y_val.shape)

(54000, 784) (10000, 784) (6000, 784) (54000,) (10000,) (6000,)


## 3.1 Some setup

In [13]:
n_input = 784
n_hidden1 = 128
n_hidden2 = 128
n_class = 10
n_epoch = 20
learning_rate = 0.001
batch_size = 128
dropout = 0.20

## 3.2 Forward Prop

In [14]:
# Our forward layer

def model(batch_x):
 
    """
    We will define the learned variables, the weights and biases,
    within the method ``model()`` which also constructs the neural network.
    The variables named ``hn``, where ``n`` is an integer, hold the learned weight variables. 
    The variables named ``bn``, where ``n`` is an integer, hold the learned bias variables.
    """
 
    b1 = tf.get_variable("b1", [n_hidden1], initializer = tf.zeros_initializer())
    h1 = tf.get_variable("h1", [n_input, n_hidden1], initializer = tf.contrib.layers.xavier_initializer())
    layer1 = tf.nn.relu(tf.add(tf.matmul(batch_x,h1),b1))
 
    b2 = tf.get_variable("b2", [n_hidden2], initializer = tf.zeros_initializer())
    h2 = tf.get_variable("h2", [n_hidden1, n_hidden2], initializer = tf.contrib.layers.xavier_initializer())
    layer2 = tf.nn.relu(tf.add(tf.matmul(layer1,h2),b2))
 
    b3 = tf.get_variable("b3", [n_class], initializer = tf.zeros_initializer())
    h3 = tf.get_variable("h3", [n_hidden2, n_class], initializer = tf.contrib.layers.xavier_initializer())
    layer3 = tf.add(tf.matmul(layer2,h3),b3)
 
    return layer3

In [15]:
# One hot encode the labels
def one_hot(n_class, Y):
    """
    return one hot encoded labels to train output layers of NN model
    """
    return np.eye(n_class)[Y]

## 3.3  Loss

In [16]:
# See here for the useful nn module https://www.tensorflow.org/api_docs/python/tf/nn

def compute_loss(predicted, actual):
    """
    This routine computes the cross entropy log loss for each of output node/classes.
    returns mean loss is computed over n_class nodes.
    """
 
    total_loss = tf.nn.softmax_cross_entropy_with_logits_v2(logits = predicted,labels = actual)
    avg_loss = tf.reduce_mean(total_loss)
    
    return avg_loss

## 3.4  Optimiser

In [17]:
# Create an optimiser

def create_optimizer():
 
    optimizer = tf.train.AdamOptimizer(learning_rate)
    return optimizer

In [18]:
def one_hot(n_class, Y):
    """
    returns one hot encoded labels to train output layers of NN model
    """
    return np.eye(n_class)[Y]

## 3.5 Train

In [19]:
def train(X_train, X_val, X_test, y_train, y_val, y_test, verbose = False):
    """
    Trains the network, also evaluates on test data finally.
    """
    # Creating place holders for image data and its labels
    X = tf.placeholder(tf.float32, [None, 784], name="X")
    Y = tf.placeholder(tf.float32, [None, 10], name="Y")
 
    # Forward pass on the model
    logits = model(X)
 
    # computing sofmax cross entropy loss with logits
    avg_loss = compute_loss(logits, Y)
 
    # create adams' optimizer, compute the gradients and apply gradients (minimize())
    optimizer = create_optimizer().minimize(avg_loss)
 
    # compute validation loss
    validation_loss = compute_loss(logits, Y)
 
    # evaluating accuracy on various data (train, val, test) set
    correct_prediction = tf.equal(tf.argmax(logits,1), tf.argmax(Y,1))
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
 
    # initialize all the global variables
    init = tf.global_variables_initializer()
 
    # starting session to actually execute the computation graph
    with tf.Session() as sess:
 
        # all the global varibles holds actual values now
        sess.run(init)
 
        # looping over number of epochs
        for epoch in range(n_epoch):
 
            epoch_loss = 0.
 
            # calculate number of batches in dataset
            num_batches = np.round(X_train.shape[0]/batch_size).astype(int)
 
            # looping over batches of dataset
            for i in range(num_batches):
 
                # selecting batch data
                batch_X = X_train[(i*batch_size):((i+1)*batch_size),:]
                batch_y = y_train[(i*batch_size):((i+1)*batch_size),:]
 
                # execution of dataflow computational graph of nodes optimizer, avg_loss
                _, batch_loss = sess.run([optimizer, avg_loss],
                                                       feed_dict = {X: batch_X, Y:batch_y})
 
                # summed up batch loss for whole epoch
                epoch_loss += batch_loss
            # average epoch loss
            epoch_loss = epoch_loss/num_batches
 
            # compute validation loss
            val_loss = sess.run(validation_loss, feed_dict = {X: X_val ,Y: y_val})
 
            # display within an epoch (train_loss, train_accuracy, valid_loss, valid accuracy)
            if verbose:
                print("epoch:{epoch_num}, train_loss: {train_loss}, train_accuracy: {train_acc}, val_loss: {valid_loss}, val_accuracy: {val_acc} ".format(
                                                       epoch_num = epoch,
                                                       train_loss = round(epoch_loss,3),
                                                       train_acc = round(float(accuracy.eval({X: X_train, Y: y_train})),2),
                                                       valid_loss = round(float(val_loss),3),
                                                       val_acc = round(float(accuracy.eval({X: X_val, Y: y_val})),2)
                                                      ))
 
        # calculate final accuracy on never seen test data
        print ("Test Accuracy:", accuracy.eval({X: X_test, Y: y_test}))
        sess.close()

In [20]:
# One hot encoding of labels for output layer training
y_train =  one_hot(n_class, y_train)
y_val = one_hot(n_class, y_val)
y_test = one_hot(n_class, y_test)

In [21]:
# Let's train and evaluate the fully connected NN model
train(X_train, X_val, X_test, y_train, y_val, y_test, True)

epoch:0, train_loss: 3.035, train_accuracy: 0.92, val_loss: 0.872, val_accuracy: 0.9 
epoch:1, train_loss: 0.547, train_accuracy: 0.95, val_loss: 0.513, val_accuracy: 0.93 
epoch:2, train_loss: 0.309, train_accuracy: 0.96, val_loss: 0.473, val_accuracy: 0.94 
epoch:3, train_loss: 0.217, train_accuracy: 0.97, val_loss: 0.432, val_accuracy: 0.94 
epoch:4, train_loss: 0.186, train_accuracy: 0.97, val_loss: 0.439, val_accuracy: 0.95 
epoch:5, train_loss: 0.159, train_accuracy: 0.97, val_loss: 0.399, val_accuracy: 0.95 
epoch:6, train_loss: 0.159, train_accuracy: 0.97, val_loss: 0.388, val_accuracy: 0.95 
epoch:7, train_loss: 0.158, train_accuracy: 0.97, val_loss: 0.372, val_accuracy: 0.95 
epoch:8, train_loss: 0.151, train_accuracy: 0.97, val_loss: 0.38, val_accuracy: 0.95 
epoch:9, train_loss: 0.126, train_accuracy: 0.98, val_loss: 0.324, val_accuracy: 0.96 
epoch:10, train_loss: 0.113, train_accuracy: 0.98, val_loss: 0.379, val_accuracy: 0.96 
epoch:11, train_loss: 0.113, train_accuracy:

# 4. An extended tutorial

Work through this tutorial:

* https://codelabs.developers.google.com/codelabs/cloud-tensorflow-mnist/#1

# 5. In Keras

Source:

* https://github.com/keras-team/keras/blob/master/examples/mnist_mlp.py

In [26]:
import keras
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout
from keras.optimizers import RMSprop

batch_size = 128
num_classes = 10
epochs = 10

# the data, split between train and test sets
(x_train, y_train), (x_test, y_test) = mnist.load_data()

x_train = x_train.reshape(60000, 784)
x_test = x_test.reshape(10000, 784)
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255

print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')

60000 train samples
10000 test samples


In [27]:
print(y_train.shape)
print(y_train[0:5])

# convert class vectors to binary class matrices
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

print(y_train.shape)
print(y_train[0:5])

(60000,)
[5 0 4 1 9]
(60000, 10)
[[0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]
 [1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]]


In [28]:
# Build a simple model
model = Sequential()
model.add(Dense(512, activation='relu', input_shape=(784,)))
model.add(Dropout(0.2))
model.add(Dense(512, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(num_classes, activation='softmax'))

model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_4 (Dense)              (None, 512)               401920    
_________________________________________________________________
dropout_3 (Dropout)          (None, 512)               0         
_________________________________________________________________
dense_5 (Dense)              (None, 512)               262656    
_________________________________________________________________
dropout_4 (Dropout)          (None, 512)               0         
_________________________________________________________________
dense_6 (Dense)              (None, 10)                5130      
Total params: 669,706
Trainable params: 669,706
Non-trainable params: 0
_________________________________________________________________


In [29]:
# Run it

model.compile(loss='categorical_crossentropy',
              optimizer=RMSprop(),
              metrics=['accuracy'])

history = model.fit(x_train, y_train,
                    batch_size=batch_size,
                    epochs=epochs,
                    verbose=1,
                    validation_data=(x_test, y_test))
score = model.evaluate(x_test, y_test, verbose=0)

print('Test loss:', score[0])
print('Test accuracy:', score[1])

Train on 60000 samples, validate on 10000 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Test loss: 0.07285254494211403
Test accuracy: 0.9841
