# Deep Learning with Keras II (mnist data, mlp)

Overview: 
- Data: mnist 
- Methodology: multilayer perceptron (mlp)

## 1. Introduction
- Part I: [Deep Learning w Keras I (Intro)]()
- In this notebook we will now use Keras to apply a deep multilayer perceptron on the MNIST data. These can be  imported using the Keras API ([datasets](https://keras.io/datasets/)). 

## 2. The data
In the following notebook we will use the [MNIST data](https://en.wikipedia.org/wiki/MNIST_database) to train a neural network. The MNIST data consists of images of handwritten digits from "0" to "9", normalized to fit into a 28x28 pixel box. This gives us a total of $28\cdot28 = 784$ input features ($x_1, x_2, ..., x_{784})$. In other words, 784 variables are associated with one observational outcome of a number from 0-9. Each input variable $x_i$ is a number between 0-255 , specifying the shade of gray on a [8-bit grayscale](https://en.wikipedia.org/wiki/Grayscale).

For our neural network layers this means that the input layer requires 784 nodes (one for each variable $x_1, x_2, ..., x_{784}$) and the output layer has 10 nodes, each class is a number.

Moreover, the dataset consists of 60,000 training images, along with a seperate test set of 10,000 images. To download and seperate the data into training and testing data, use:
```python
from keras.datasets import mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
```

#### Preprocessing
In order to model the data we need to process them in a certain way before passing them to the model. 

The dimensions of the imported dataset are a three-dimensional vector. For the training data we have a shape of (60000, 28, 28) and each observation has the shape (1, 28, 28) or 28x28 variables per image. In order to use them for our model we need to [reshape](https://docs.scipy.org/doc/numpy/reference/generated/numpy.reshape.html) (aka flatten) our data. Each observation is covnerted from 28x28 to have only one dimension of 784 variables. The whole training data becomes (60000x784) or 60,000 observations with 784 variables.

Additionally, it is common to use 32-bit precision when training a neural network and we will therefore convert the data to be 32 bit floats ([more](https://github.com/ageron/handson-ml/issues/265)). 

We also use [feature scaling](https://medium.com/greyatom/why-how-and-when-to-scale-your-features-4b30ab09db5e) on our data. In particular, we normalize our data to narrow the range of values to a scale of 0 and 1. In general, gradient descent and other algoirithms converge faster with scaled features and may also improve the performance of the network (but not necessarily).  We perform normalization dividing by the maximum of 255.

A last step is to convert the y into a categorical variable. To summarize preprocessing: 

- Flatten/reshape the data
- Convert to type "float 32"
- Normalize the data
- Convert y to a categorical variable

#### The model 
Next step is to implement a neural network and apply this model to the MNIST data.

In [1]:
# Imports
import numpy as np
import keras
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation
from keras.optimizers import RMSprop
from keras import utils

Using TensorFlow backend.


In [2]:
# Load data 
(x_train, y_train), (x_test, y_test) = mnist.load_data()

# Examine shape
print("Shape training data:", "\nx:",x_train.shape, "y:",y_train.shape)
print("Shape test data:", "\nx:",x_test.shape, "y:",y_test.shape)
# print("Example of one observation (28x28 pixels):\n",x_train[123,:,:])

unique, counts = np.unique(y_train, return_counts = True)
print("Unique values in y: ", unique)

# Flatten the data
x_train = x_train.reshape(60000, 784)
x_test = x_test.reshape(10000, 784)

# Change the type 
x_train = x_train.astype("float32")
x_test = x_test.astype("float32")

# Normalize the data
max_x = int(x_train.max())
x_train /= max_x
x_test /= max_x

# new shape
print("Flattened shape x_train:", x_train.shape)
print("Flattened shape x_test:", x_test.shape)

# Convert y to categorical vector
y_train = utils.to_categorical(y_train)
y_test = utils.to_categorical(y_test)

print("Shape of y_train:", y_train.shape)
print("Shape of x_train:", x_train.shape)

Shape training data: 
x: (60000, 28, 28) y: (60000,)
Shape test data: 
x: (10000, 28, 28) y: (10000,)
Unique values in y:  [0 1 2 3 4 5 6 7 8 9]
Flattened shape x_train: (60000, 784)
Flattened shape x_test: (10000, 784)
Shape of y_train: (60000, 10)
Shape of x_train: (60000, 784)


**Note**

The form of the y_train seems a little unusual (60000, 10) as we have one dummy for each outcome, yielding ten outcome rows. 

### The model
After loading and preprocessing the data we can build a sequential model and feed it our data for optimization. In addition to the mentioned elements it includes [dropout regularization](https://machinelearningmastery.com/dropout-regularization-deep-learning-models-keras/). Dropout is a technique where a percentage of neurons are randomly shut off during training to reduce overfitting and in general improves the generalizability of the model. Previously we have used a list of layers to pass our model.

In [3]:
# Model parameters
batch_size = 128
epochs = 20

# Neural network
model = Sequential([
    Dense(512, input_dim = 784),
    Activation("relu"),
    Dropout(0.2),
    Dense(512),
    Activation("relu"),
    Dropout(0.2),
    Dense(10),
    Activation("softmax")
])

# Compile 
model.compile(loss = "categorical_crossentropy",
              optimizer = RMSprop(),
              metrics = ["accuracy"] )

# Fit the model 
model.fit(x_train, y_train, 
          batch_size = batch_size, 
          epochs = epochs, 
          verbose = 1,
          validation_data = (x_test, y_test))

Train on 60000 samples, validate on 10000 samples
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<keras.callbacks.History at 0x2971efc9c50>

In [4]:
# Calculate score for test set
score = model.evaluate(x_test, y_test)
print("Test loss:", score[0].round(4))
print("Test accuracy:", score[1])
print("### The End ###")

Test loss: 0.1057
Test accuracy: 0.9836
### The End ###
