<a href="https://colab.research.google.com/github/wdconinc/practical-computing-for-scientists/blob/master/Lectures/lecture34.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Lecture 34: Machine Learning

## Last Episode

Artificial neural networks
- using non-linear activation functions
- example: diabetes score
- example: MNIST handwriting recognition

## Preamble

In [0]:
import numpy as np
import sklearn as sk
import seaborn as sns
import matplotlib.pyplot as plt

# for graph plots
!apt install -q -y graphviz
!pip install graphviz
import graphviz

Reading package lists...
Building dependency tree...
Reading state information...
The following additional packages will be installed:
  fontconfig libann0 libcairo2 libcdt5 libcgraph6 libdatrie1 libgd3
  libgts-0.7-5 libgts-bin libgvc6 libgvpr2 libjbig0 liblab-gamut1 libltdl7
  libpango-1.0-0 libpangocairo-1.0-0 libpangoft2-1.0-0 libpathplan4
  libpixman-1-0 libthai-data libthai0 libtiff5 libwebp6 libxaw7 libxcb-render0
  libxcb-shm0 libxmu6 libxpm4 libxt6
Suggested packages:
  gsfonts graphviz-doc libgd-tools
The following NEW packages will be installed:
  fontconfig graphviz libann0 libcairo2 libcdt5 libcgraph6 libdatrie1 libgd3
  libgts-0.7-5 libgts-bin libgvc6 libgvpr2 libjbig0 liblab-gamut1 libltdl7
  libpango-1.0-0 libpangocairo-1.0-0 libpangoft2-1.0-0 libpathplan4
  libpixman-1-0 libthai-data libthai0 libtiff5 libwebp6 libxaw7 libxcb-render0
  libxcb-shm0 libxmu6 libxpm4 libxt6
0 upgraded, 30 newly installed, 0 to remove and 8 not upgraded.
Need to get 4,154 kB of archives.
Aft

## Handwriting Recognition with Artificial Neural Networks

We looked at a problem which we cannot solve in traditional ways: recognition of handwritten digits (a standard machine learning problem).

In [0]:
# Let's first get the MNIST data set
from sklearn.datasets import fetch_mldata
mnist = fetch_mldata("MNIST original")

As usual we split the data into a training and test data set.

In [0]:
# Use a random permutation
np.random.seed(0)
perm = np.arange(len(mnist.data))
perm = np.random.permutation(len(mnist.data))

# Rescale, split in training and test data
X, y = mnist.data[perm] / 255., mnist.target[perm]
X_train, X_test = X[:60000], X[60000:]
y_train, y_test = y[:60000], y[60000:]

Let's create an artificial neural network classifier that has a hidden layer with 100 nodes and ReLU activation function.

In [0]:
from sklearn.neural_network import MLPClassifier
mlp = MLPClassifier(hidden_layer_sizes=(100, ), max_iter=40, alpha=1e-4,
                    solver='sgd', verbose=10, tol=1e-3, random_state=1,
                    learning_rate_init=.1, activation = "relu")

Let's now train our network. This may take a while... (even if the tolerance is set to a fairly large value)

In [0]:
mlp.fit(X_train, y_train)

Iteration 1, loss = 0.29249660
Iteration 2, loss = 0.12361080
Iteration 3, loss = 0.09000660
Iteration 4, loss = 0.07097907
Iteration 5, loss = 0.05806030
Iteration 6, loss = 0.04855646
Iteration 7, loss = 0.04091454
Iteration 8, loss = 0.03512452
Iteration 9, loss = 0.02940072
Iteration 10, loss = 0.02546369
Iteration 11, loss = 0.02171612
Iteration 12, loss = 0.01749820
Iteration 13, loss = 0.01542681
Iteration 14, loss = 0.01283184
Iteration 15, loss = 0.01078268
Iteration 16, loss = 0.00875058
Iteration 17, loss = 0.00711103
Iteration 18, loss = 0.00634392
Iteration 19, loss = 0.00510398
Iteration 20, loss = 0.00475525
Iteration 21, loss = 0.00382842
Iteration 22, loss = 0.00351482
Training loss did not improve more than tol=0.001000 for two consecutive epochs. Stopping.


MLPClassifier(activation='relu', alpha=0.0001, batch_size='auto', beta_1=0.9,
       beta_2=0.999, early_stopping=False, epsilon=1e-08,
       hidden_layer_sizes=(100,), learning_rate='constant',
       learning_rate_init=0.1, max_iter=40, momentum=0.9,
       nesterovs_momentum=True, power_t=0.5, random_state=1, shuffle=True,
       solver='sgd', tol=0.001, validation_fraction=0.1, verbose=10,
       warm_start=False)

However, even with this level of training we already obtain very good results on the test data set. The scores below are the accuracy scores: i.e. 0.99 means a 99% accuracy.

In [0]:
print("Training set score: %f" % mlp.score(X_train, y_train))
print("Test set score: %f" % mlp.score(X_test, y_test))

Training set score: 0.999950
Test set score: 0.978000


### How Can We Do Better?

If we think about the draw backs of the artificial neural network we used above, then one should jump out right away: the algorithm does not inherently understand that a shifted or scaled image should result in the same classification. The same '7' drawn slightly more to the left or to the right is still a '7', but to an artificial neural network with 784 input features this looks like an entirely different thing. Can we think of a way to build in this position and scale invariance? This is where *convolutional neural networks* bring us a solution.


#### Convolution in the time domain: a reminder
We will use a concept we introduced earlier, in the section on digital signal procession: *convolution*.  Remember how we defined this then:
$$ (f * g)(t) = \int_{-\infty}^{+\infty} f(t - \tau) g(\tau) d\tau $$
in the time domain.

#### Convolution in the image domain
Now we will use a similar convolution but in the 2-dimensional image domain. We mentioned this approach in the context of convolution in digital signal processing already when we discussed point-spread functions in digital image processing: the registered image is the original image convoluted with the point-spread function of the optics of the camera's lens system.

In our case we will talk about the convolution *kernel* as a *filter* or *feature detector* and the convolution with the input image map will be *activation map* or *feature map*.

Let's say we wish to take the convolution of the $5 \times 5$ black/white input image on the left with the $3 \times 3$ kernel on the right:

<img src="https://ujwlkarn.files.wordpress.com/2016/07/screen-shot-2016-07-24-at-11-25-13-pm.png" width="46%"> <img src="https://ujwlkarn.files.wordpress.com/2016/07/screen-shot-2016-07-24-at-11-25-24-pm.png" width="49%">

We will obtain a new $3 \times 3$ activation map with the product values of the corresponding pixels as we shift the convolution kernel (with *stride* equal to 1) across the image:

<img src="https://ujwlkarn.files.wordpress.com/2016/07/convolution_schematic.gif" width="66%">

We can extend this to any $n \times n$ input image (with $p$ pixels of padding) and any $m \times m$ kernel with stride $k$ to obtain an $n' \times n'$ activation map with
$$ n' = \frac{n - m + 2 \cdot p}{k} + 1 $$
Indeed: $n' = \frac{5 - 3 + 2 \cdot 0}{1} + 1 = 3$

Although we might want the activation map to be normalized if we continue to think in terms of images, there is no requirement that this is the case. It is just a map which tells us where a certain feature is activated.

#### Various convolution kernels
Certain convolution kernels will perform different functions:

<img src="https://ujwlkarn.files.wordpress.com/2016/08/screen-shot-2016-08-05-at-11-03-00-pm.png" width="50%" align="middle">

#### Convolutional kernels in neural networks
In our convolution neural networks we will use a non-linear activation function (e.g. ReLU) on the output of the convolution step, we will let the training algorithm decide on the optimal configuration of the kernel, and we will allow the use of $q$ kernels (or filters). As the output of our convolution layer we will therefore have $n' \times n' \times q$ output values.

#### Pooling for position invariance
The next step in convolutional neural networks is typically a *pooling layer* where we reduce the full $n' \times n'$ activation map to a smaller map by downsampling to the sum or maximum of a range of pixels, e.g. the maximum in a $5 \times 5$ range. A large activation for a particular feature anywhere in this $5 \times 5$ area of the input image will now activate the same pixel in the pooled feature map and we have obtained invariance under shifts.

<img src="https://ujwlkarn.files.wordpress.com/2016/08/screen-shot-2016-08-07-at-9-15-21-pm.png" width="80%">

### Keras: A Professional Machine Learning Tool

In [0]:
import keras
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten
from keras.layers import Conv2D, MaxPooling2D
from keras import backend as K

batch_size = 128
num_classes = 10
epochs = 12

# input image dimensions
img_rows, img_cols = 28, 28

# the data, split between train and test sets
(x_train, y_train), (x_test, y_test) = mnist.load_data()

if K.image_data_format() == 'channels_first':
    x_train = x_train.reshape(x_train.shape[0], 1, img_rows, img_cols)
    x_test = x_test.reshape(x_test.shape[0], 1, img_rows, img_cols)
    input_shape = (1, img_rows, img_cols)
else:
    x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 1)
    x_test = x_test.reshape(x_test.shape[0], img_rows, img_cols, 1)
    input_shape = (img_rows, img_cols, 1)

x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255

print('x_train shape:', x_train.shape)
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')

# convert class vectors to binary class matrices
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

Using TensorFlow backend.


Downloading data from https://s3.amazonaws.com/img-datasets/mnist.npz
x_train shape: (60000, 28, 28, 1)
60000 train samples
10000 test samples


In [0]:
model = Sequential()
model.add(Dense(512, activation = 'relu', input_shape = (784,)))
model.add(Dropout(0.2))
model.add(Dense(512, activation = 'relu'))
model.add(Dropout(0.2))
model.add(Dense(num_classes, activation = 'softmax'))

In [0]:
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_1 (Dense)              (None, 512)               401920    
_________________________________________________________________
dropout_1 (Dropout)          (None, 512)               0         
_________________________________________________________________
dense_2 (Dense)              (None, 512)               262656    
_________________________________________________________________
dropout_2 (Dropout)          (None, 512)               0         
_________________________________________________________________
dense_3 (Dense)              (None, 10)                5130      
Total params: 669,706
Trainable params: 669,706
Non-trainable params: 0
_________________________________________________________________


In [0]:
model.compile(loss = 'categorical_crossentropy',
              optimizer = keras.optimizers.Adadelta(),
              metrics = ['accuracy'])

In [0]:
x_train = x_train.reshape(60000, 784)
x_test = x_test.reshape(10000, 784)

history = model.fit(x_train, y_train,
                    batch_size = batch_size,
                    epochs = epochs,
                    verbose = 1,
                    validation_data = (x_test, y_test))

Train on 60000 samples, validate on 10000 samples
Epoch 1/12
Epoch 2/12
Epoch 3/12
Epoch 4/12
Epoch 5/12
Epoch 6/12
Epoch 7/12
Epoch 8/12
Epoch 9/12
Epoch 10/12
Epoch 11/12
Epoch 12/12


In [0]:
score = model.evaluate(x_test, y_test, verbose = 0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])

Test loss: 0.06182534301124979
Test accuracy: 0.9835


### Convolutional Neural Networks

In [0]:
model = Sequential()
model.add(Conv2D(32, 
                 kernel_size = (3, 3),
                 activation = 'relu',
                 input_shape = input_shape))
model.add(Conv2D(64, 
                 kernel_size = (3, 3),
                 activation = 'relu'))
model.add(MaxPooling2D(pool_size = (2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128, activation = 'relu'))
model.add(Dropout(0.5))
model.add(Dense(num_classes, activation = 'softmax'))

In [0]:
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_3 (Conv2D)            (None, 26, 26, 32)        320       
_________________________________________________________________
conv2d_4 (Conv2D)            (None, 24, 24, 64)        18496     
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 12, 12, 64)        0         
_________________________________________________________________
dropout_5 (Dropout)          (None, 12, 12, 64)        0         
_________________________________________________________________
flatten_2 (Flatten)          (None, 9216)              0         
_________________________________________________________________
dense_6 (Dense)              (None, 128)               1179776   
_________________________________________________________________
dropout_6 (Dropout)          (None, 128)               0         
__________

In [0]:
model.compile(loss = keras.losses.categorical_crossentropy,
              optimizer = keras.optimizers.Adadelta(),
              metrics = ['accuracy'])

In [0]:
x_train = x_train.reshape(60000, 28, 28, 1)
x_test = x_test.reshape(10000, 28, 28, 1)

model.fit(x_train, y_train,
          batch_size = batch_size,
          epochs = epochs,
          verbose = 1,
          validation_data = (x_test, y_test))

Train on 60000 samples, validate on 10000 samples
Epoch 1/12
Epoch 2/12
Epoch 3/12
Epoch 4/12
Epoch 5/12
Epoch 6/12
Epoch 7/12
Epoch 8/12
Epoch 9/12
Epoch 10/12
Epoch 11/12
Epoch 12/12


<keras.callbacks.History at 0x7f51e01c0240>

In [0]:
score = model.evaluate(x_test, y_test, verbose = 0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])

Test loss: 0.029486772899981223
Test accuracy: 0.9913
