# Introduction to Neural Network (Digits Recognition) Model [Tutorial]

In [None]:
# image data
from sklearn.datasets import load_digits

# plotting
import matplotlib.pyplot as plt
import seaborn as sns

# modeling
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix

# data processing
import pandas as pd

In [None]:
digits = 

### Prepare data for modeling

In [None]:
# grab all data (1797 records, and 8x8=64 columns)
X = digits.data

# grab the target (true) value for each image
y = digits.target

X_train, X_test, y_train_raw, y_test_raw = train_test_split(X, y, random_state=314)

y_train = pd.get_dummies(y_train_raw).values
y_test = pd.get_dummies(y_test_raw).values

print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)

Before we start, we need to take a small digression. The keras Neural Network model training results are not easily reproducible since it involves a lot of shuffling and random initializations. In order to maintain consitency, we will have to initialize some random seeds before every model run. We will create a function to do this.

In [None]:
from tensorflow import random as tf_random
import numpy as np
import random

def init_seeds(s):
    '''
    Initializes random seeds prior to model training 
    to ensure reproducibality of training results.
    '''
    tf_random.set_seed(s)
    np.random.seed(s)
    random.seed(s)

### Multi-layer Perceptron (MLP) model

Define a model with one input layer and one output layer.

In [None]:
#--

mlp1 = 

We have initialized a sequential model with an input shape of 64, and an output layer with a shape of 10. There are 650 total parameters in this model:

In [None]:
# initial weights and biases
#--

_**Useful Resources:**_

* [Keras Documentation: The Sequential model](https://keras.io/guides/sequential_model/)
* [TensorFlow API Documentation: tf.keras.Sequential](https://www.tensorflow.org/api_docs/python/tf/keras/Sequential)

Before we can train the model, we need to specify training parameters (aka compile the model).

In [None]:
# compile the model
#--

#### Train (fit) the model

In [None]:
#--

The accuracy is very low, because the model made a single pass over the dataset. The model must make multiple passes over the entire dataset in order to produce better results. This can be adjusted by using the `epoch` parameter.

Epoch is equal to the number of times the algorithm sees the entire dataset.

By the way, what is that mysterious-looking number 43? Where did that come from?

Keras uses a `batch size` of 32 by default. So this model above created 32 batches of 43 records each by divinding the total number of records in the entire dataset (1,347) by 32. However, the default `epoch` is 1, so the model above made 43 **iterations** (forward + backward) using batches of 52 records.

`batch_size * number of iterations` --> `epoch`

Note that if you use `shuffle=True` in the `fit()` function, keras will shuffle the records in the training dataset before splitting them into batches.

Let's increase the number of epochs.

In [None]:
# initialize seeds
init_seeds(314)

# prepare the model architecture
#--

Note that now the model is making 7 passes through the entire data. For each epoch, there are 43 iterations (forward + backward) with 32 samples in each iteration.

Let's try to reduce the batch size and see what happens.

In [None]:
# initialize seeds
init_seeds(314)

# prepare the model architecture
mlp3 = Sequential(
    [
        Input(shape=8*8),
        Dense(10, activation='softmax', name='output_layer')
    ], 
    name='simple_mlp_7epochs_5bs')

mlp3.compile(optimizer='sgd', loss='categorical_crossentropy', metrics='accuracy')

mlp3.fit();

The batch size of 5 gives us 270 iterations for each epoch.

**Useful Resources:**
* [Keras FAQ: What do sample, batch, and epoch mean?](https://keras.io/getting_started/faq/#what-do-sample-batch-and-epoch-mean)
* [What is the trade-off between batch size and number of iterations to train a neural network?](https://stats.stackexchange.com/questions/164876/what-is-the-trade-off-between-batch-size-and-number-of-iterations-to-train-a-neu)

Tip: Using larger batch sizes will require more memory, and negatively impact the ability of the model to generalize well. On the other hand, very small batch size increases the risk of making the model unreliable (too stochastic.)

#### Check the model accuracy on the test sample

In [None]:
loss, accuracy = 

print(f'Loss: {loss:.2%}, Accuracy: {accuracy:.2%}')

We will use this multiple times, so let's create a small function for this.

In [None]:
def test_model(model, test_data, lables):
    loss, accuracy = model.evaluate(test_data, lables, batch_size=1)
    print(f'Loss: {loss:.2%}, Accuracy: {accuracy:.2%}')
    
#--

The model `mlp3` with a batch size of five and seven epochs seems to be performing the best.

Next, let's add one hidden layer to this model and see how much improvement it yields.

#### Add a hidden layer to the model

In [None]:
# initialize seeds
init_seeds(314)

# prepare the model architecture
mlp4 = Sequential()

mlp4.compile(optimizer='sgd', loss='categorical_crossentropy', metrics='accuracy')

mlp4.fit(X_train, y_train, batch_size=5, epochs=15, shuffle=True, verbose=2);

In [None]:
test_model(mlp4, X_test, y_test)

Adding one hidden layer, and increasing the number of epochs, improved the model's accuracy.

**Interesting questions:**

* [How to choose the number of hidden layers and the size of hidden layers?](https://stackoverflow.com/questions/10565868/multi-layer-perceptron-mlp-architecture-criteria-for-choosing-number-of-hidde?lq=1)
* [Why are neural networks becoming deeper, but not wider?](https://stats.stackexchange.com/questions/222883/why-are-neural-networks-becoming-deeper-but-not-wider)

#### Try a different activation function (relu)

In [None]:
# initialize seeds
init_seeds(314)

# prepare the model architecture
mlp5 = Sequential(
    [
        Input(shape=8*8),
        Dense(32, activation=, name='hidden_layer'),
        Dense(10, activation=, name='output_layer')
    ],
    name='mlp_1hidden_relu')

mlp5.compile(optimizer='sgd', loss='categorical_crossentropy', metrics='accuracy')

mlp5.fit(X_train, y_train, batch_size=5, epochs=15, shuffle=True, verbose=0)

test_model(mlp5, X_test, y_test)

Changing the activation function did not improve the model performance for this dataset. However, the `relu` activation function usually outperforms `sigmoid` and it's usually preferred.

#### Try a different optimizer (adam)

In [None]:
# initialize seeds
init_seeds(314)

# prepare the model architecture
mlp6 = Sequential(
    [
        Input(shape=8*8),
        Dense(32, activation='relu', name='hidden_layer'),
        Dense(10, activation='softmax', name='output_layer')
    ],
    name='mlp_1hidden_adam')

mlp6.compile(optimizer=, loss='categorical_crossentropy', metrics='accuracy')

mlp6.fit(X_train, y_train, batch_size=5, epochs=15, shuffle=True, verbose=0)

test_model(mlp6, X_test, y_test)

The adam optimizer improved the model's performance.

**Useful resource:**

* [Keras Documentation: Adam](https://keras.io/api/optimizers/adam/)

### Plot confusion matrix

In [None]:
# calculate predicted probabilities
probs = 

In [None]:
# we can use formatted string literal to print float values instead
[f'{x:.5f}' for x in probs[0]]

The model returns a probability for each lable (digit). We can grab the lable (digit) that has the highest probability. For example, for the first sample, the model is predicting a very high probability of 0.99974 for that digit to be 1.

In [None]:
# grab the digit with max probability
y_preds = 

In [None]:
cfm = 
plt.figure(figsize=(6, 6))

sns.heatmap()
plt.xlabel('Predicted value', fontsize=12)
plt.ylabel('True value', fontsize=12)
plt.title('Confusion Matrix (Neural Network)', fontsize=12, weight='semibold');

#### Saving (exporting) the model

Save the model as a single `HDF5` file. [`HDF5` stands for Hierarchical Data Format, v5.]

In [None]:
mlp_path = '../misc/digits_recognition_mlp_model.h5'

#--

This saves the trained model and all trackable objects (config, weights, and optimizer) attached to the model.

### Load a saved model

We can load a pre-trained (and saved) model by using `load_model()` function.

In [None]:
#--

In [None]:
# use the model that we just loaded to predict values
#--

**Useful resources:**
* [Keras: Serialization and saving](https://keras.io/guides/serialization_and_saving/)
* [Keras: Model saving and serialization APIs](https://keras.io/api/models/model_saving_apis/)

#### Visualize the predictions

In [None]:
# grab raw model predictions for the entire dataset
preds_raw = 

In [None]:
# grab the predicted digit (based on the highest probability) for the entire dataset
preds = 

Let's plot the first 100 predictions. The labels are shown in green if the model predicted the digit correcly, otherwise it's shown in red.

In [None]:
fig, axes = plt.subplots(10, 10, figsize=(7, 7), subplot_kw={'xticks':[], 'yticks':[]})

for i, ax in enumerate(axes.flat):
    ax.imshow(digits.images[i], cmap='binary', interpolation='bicubic')
    ax.set_xticklabels([])
    ax.set_yticklabels([])
    pred = preds[i]
    act = y[i]
    if pred == act:
        ax.text(0.05, 0.05, preds[i], color='green',
                weight='semibold', transform=ax.transAxes)
    else:
        ax.text(0.05, 0.05, preds[i], color='tomato',
                weight='semibold', transform=ax.transAxes)
plt.show();