Exercise 8 - Introduction to Neural Networks
=======

Originally hypothesised in the 1940s, neural networks are now one of the main tools used in modern AI. Neural networks can be used for both regression and categorisation applications. Recent advances with storage, processing power, and open-source tools have allowed many successful applications of neural networks in medical diagnosis, filtering explicit content, speech recognition and machine translation.

In this exercise we will look at a dataset comparing three different dog breeds, with the age, weight, and height of multiple individuals, and we will attempt to create a neural network model for the purpose of classification, predicting the breed of dog for a given sample individual.

In [None]:
# Sets up the graphing configuration
import warnings
warnings.filterwarnings("ignore")
import matplotlib.pyplot as graph
%matplotlib inline
graph.rcParams['figure.figsize'] = (15,5)
graph.rcParams["font.family"] = 'DejaVu Sans'
graph.rcParams["font.size"] = '12'
graph.rcParams['image.cmap'] = 'rainbow'

Step 1
------

Let's start by opening up our data and having a look at it.

In [3]:
import tensorflow as tf
import keras
print('keras using %s backend'%keras.backend.backend())

keras using tensorflow backend


In [9]:
import pandas as pd
import numpy as np

# Loads the dataset
data = pd.read_csv('Data/dog_data.csv')

###--- WRITE print(data.head()) TO VIEW THE TOP 5 DATA POINTS OF THE DATA SET ---###

###

# Defines the feature dataframe
features = data.drop(['breed'], axis = 1)

    age  weight  height  breed
0  9.47    6.20    6.80      1
1  7.97    8.63    8.92      0
2  9.51    6.40    5.78      1
3  8.96    8.82    6.28      2
4  8.37    3.89    5.62      1


## Step 2

Our target data, three breeds of dogs, are represented numerically in our dataset, as `0`, `1`, and `2`. But for a neural network such as this, these numbers are a little misleading, it might imply that `1` is closer to `2` than `0` is, in some way. But that is not necessarily the case. Therefore, we may want our network to have three output nodes, one for each of the target classes. For example, given three output nodes, the first node might return positive if our network thinks a sample is of class `0`, while the other two nodes remain silent, or the second node might return positive if it thinks a sample is of class `1`, and so on.

To this end, we might wish to represent our data using one-hot vectors, that is, representing our target as binary vector combinations with just a single `1` value and all the others `0`.

This will give us the following target vectors:

| breed 0 | breed 1 | breed 2 |
|:------- |:------- |:------- |
| `0 0 1` | `0 1 0` | `1 0 0` |

Each of the three values in our new target vectors will correspond to one of our output nodes, once we set up our network.

In [10]:
from sklearn.preprocessing import OneHotEncoder

# Sets the  labels (numerical)
labels = np.array(df['breed'])

###--- REPLACE THE ??? BELOW WITH labels ---###
onehot = OneHotEncoder(sparse = False).fit_transform(np.transpose([???]))
###

print(onehot[:5])

[[0. 1. 0.]
 [1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]
 [0. 1. 0.]]


## Step 3

For our model, let us first define our `train_X`, `train_Y`, `test_X`, and `test_Y` variables.

In [11]:
# Take the first 4/5 of the data and assign it to training
train_X = features.values[:160]
train_Y = onehot[:160]

# Take the last 1/5 of the data and assign it to testing
test_X = features.values[160:]
test_Y = onehot[160:]

## Step 4

Let's start by establishing the model. Sequential is the standard model type for Keras.

In [23]:
# Set a randomisation seed for replicatability.
np.random.seed(6)

###--- REPLACE THE ??? BELOW WITH Sequential() ---###
model = keras.models.???
###

Let's now add our layers to the model. Our first will be our hidden layer, which will also specify the dimension of the implicit input layer it receives. The number of units in the hidden layer is determined through a number of varying wisdoms. Some such wisdoms are:
> "A rule of thumb is for the size of this [hidden] layer to be somewhere between the input layer size ... and the output layer size ..." (Blum, 1992, p. 60)."
   
> "Typically, we specify as many hidden nodes as dimensions [principal components] needed to capture 70-90% of the variance of the input data set." (Boger and Guterman, 1997)."
   
> "One rule of thumb is that it should never be more than twice as large as the input layer." (Berry and Linoff, 1997, p. 323)."

Sometimes to follow one wisdom, we have to ignore another. For now, however, let's give **two hidden layers, one with 4 nodes and one with 2 nodes**, a go.

Most of our model parameters around this are fairly straightforward, we have an **input dimension of 3** (our three features), and an **output dimension of 3** (our three classes), the output dimension being used in our final layer, the output layer.

In [24]:
# We can use a structure with the format [input nodes, hidden1 nodes, hidden2 nodes, output nodes] to help us

###--- REPLACE THE ???s BELOW WITH THE APPROPRIATE NUMBERS OF NODES ---###
structure = [???, ???, ???, ???]
###

# Hidden layer 1 + input layer
model.add(keras.layers.Dense(units=structure[1], input_dim = structure[0], activation = 'relu'))

# Hidden layer 2
model.add(keras.layers.Dense(units=structure[2], activation = 'relu'))

# Output layer
model.add(keras.layers.Dense(units=structure[-1], activation = tf.nn.softmax))

## Step 5

We can now compile and fit our model. We must specify the loss function, the omtimisation function, and our training evaluation metric. For our loss function, we can use *categorical cross entropy*, but the mathematics of that are beyond the scope of this exercise, all we need to know is that it's looking at multiple categories. For our optimizer, we can use *stochastic gradient descent*, as introduced in the course. Lastly, our metric can simply be *accuracy*.

In [25]:
# Let's compile the model

###--- REPLACE THE ???s BELOW WITH 'categorical_crossentropy', 'sgd', AND THEN 'accuracy' (INCLUDING THE QUOTES) ---###
model.compile(loss = ???, optimizer = ???, metrics = [???])
###

# Time to fit the model
print('Starting training')

###--- REPLACE THE ??? BELOW WITH train_X AND THEN train_Y ---###
training_stats = model.fit(???, ???, batch_size = 1, epochs = 24, verbose = 0)
###

print('Training finished')
print('Training Evaluation: loss = %0.3f, accuracy = %0.2f%%'
      %(training_stats.history['loss'][-1], 100 * training_stats.history['acc'][-1]))

Starting training
Training finished
Training Evaluation: loss = 0.193, accuracy = 95.00%


In [None]:
# We can plot our training statistics to see how it developed over time

###--- REPLACE THE ???s BELOW WITH 'acc' AND THEN 'loss' (INCLUDING THE QUOTES) ---###
accuracy, =graph.plot(training_stats.history[???],label='Accuracy')
training_loss, =graph.plot(training_stats.history[???],label='Training Loss')
###

graph.legend(handles=[accuracy,training_loss])
loss = np.array(training_stats.history['loss'])
xp = np.linspace(0, loss.shape[0], 10 * loss.shape[0])
graph.plot(xp, np.full(xp.shape, 1), c = 'k', linestyle = ':', alpha = 0.5)
graph.plot(xp, np.full(xp.shape, 0), c = 'k', linestyle = ':', alpha = 0.5)
graph.show()

## Step 6

Now that it's trained, let's see how it performs on our test data! It's important to test a model on data that it has never seen before, to make sure it doesn't overfit. Now let's evaluate it against the test set:

In [None]:
###--- REPLACE THE ??? BELOW WITH test_X AND test_Y ---###
evaluation = model.evaluate(???, ???, verbose=0)
###

print('Test Set Evaluation: loss = %0.6f, accuracy = %0.2f' %(evaluation[0], 100*evaluation[1]))

It seems to be very accurate with the random seed that we set, but let's see how it predicts something completely new and unclassified! Come up with a new sample of the format `[age, weight, height]` to test it with.

In [None]:
###--- REPLACE ??? BELOW WITH A NEW SAMPLE VECTOR, e.g. [9, 7, 7] ---###
# [age, weight, height]
new_sample = [???, ???, ???]
###

In [None]:
# Let's have a look at where our new sample sits in the feature space.

# Plots out the age-weight relationship

###--- REPLACE THE ???s BELOW WITH new_sample ---###
graph.plot(???[0], ???[1], 'ko', marker='x')
###

graph.scatter(train_X[:,0], train_X[:,1], c = target[:160])
graph.title('samples by age and weight')
graph.xlabel('age')
graph.ylabel('weight')
graph.show()

# Plot out the age-height relationship

###--- REPLACE THE ???s BELOW WITH new_sample ---###
graph.plot(???[0], ???[2], 'ko', marker='x')
###

graph.scatter(train_X[:,0], train_X[:,2], c = target[:160])
graph.title('samples by age and height')
graph.xlabel('age')
graph.ylabel('height')
graph.show()

Looks alright? Now let's see what breed of dog the model says it is!

In [None]:
###--- REPLACE THE ???s BELOW WITH new_sample ---###
predicted = model.predict(np.array([???]))
print('Breed prediction for %s:' %(new_sample))
###

print(np.around(predicted[0],2))
print('Breed %s, with %i%% certainty.' %(np.argmax(predicted), np.round(100 * predicted[:, np.argmax(predicted)][0])))

Breed `0` should be blue, breed `1` should be green, and breed `2` should be red. How does the model's prediction compare to how you would pick this new sample's class?

## Conclusion

We've built a simple neural network to help us predict dog breeds! In the next exercise we'll look into neural networks with a bit more depth, and at the factors that influence how well it learns.