# 01 A Template for Deep Learning with TensorFlow 2.0
## Dr. Tristan Behrens

In the following we will solve the Hello World of Machine Learning and Computer Vision: MNIST.

This notebook has a couple of purposes:
- introducing you to the world of Deep Learning,
- explaining the key concepts, and
- acting as a template for all your future projects.

## Make sure that we have TensorFlow 2 enabled.

In [None]:
%tensorflow_version 2.x

## Import all necessary modules  and check TensorFlow version.

In [None]:
import tensorflow as tf
assert tf.__version__.startswith("2."), "You have TensorFlow version {}, 2.X is required, please upgrade.".format(tf.__version__)

import tensorflow_datasets as tfds
import numpy as np
import matplotlib.pyplot as plt
from tensorflow.keras import models, layers

## Load and split MNIST-dataset.

The MNIST dataset is a fine example for Supervised Learning. In Supervised Learning you have labeled data. That is, for each input sample, you have an expected output. This is usually called label or target.

We split the data into tree subsets:
- Train: For training the Neural Network.
- Validate: To see how good the Neural Network is after each epoch.
- Test: To see how good the Neural Network is after training.

It is a general practice to have at least validate. Ideally, you would also have test, but some people do not use a test set at all.

Link: [MNIST database](https://www.google.com/search?client=safari&rls=en&q=mnist&ie=UTF-8&oe=UTF-8).

In [None]:
(mnist_train, mnist_validate, mnist_test), info = tfds.load(
    name="mnist", 
    split=["train[:80%]", "train[80%:]", "test"],
    with_info=True,
    as_supervised=True
)

print("Train:   ", len(list(mnist_train)))
print("Validate:", len(list(mnist_validate)))
print("Test:    ", len(list(mnist_test)))

## Look at your data!

Never trust the source of your data. Even if you created it. Do not worry, this is not paranoia. It is just a good way how to ensure the quality of your project. Always look at your data, because most of the times if there is something not so nice, the data is the cause.

In [None]:
index = 1
plt.figure(figsize=(20, 2))
for mnist_example in mnist_train.take(6):
    image, label = mnist_example

    plt.subplot(1, 6, index)
    plt.imshow(image.numpy()[:, :, 0], cmap=plt.get_cmap("gray"))
    plt.title("Label: {}".format(label.numpy()))
    index += 1
plt.show()
plt.close()

## Preparing the datasets with tf.data.

You can build very efficient input pipelines for your datasets with tf.data. Here we will use it for two things:
- Converting the images to floating point numbers and normalizing the data, and
- map the labels to a so called one hot encoding.

Link: [tf.data: Build TensorFlow input pipelines](https://www.tensorflow.org/guide/data)

In [None]:
def encode(image, label):
    image_encoded = tf.image.convert_image_dtype(image, dtype=tf.float32)
    label_encoded = tf.one_hot(label, depth=10)
    return image_encoded, label_encoded

mnist_train = mnist_train.map(lambda image, label: encode(image, label))
mnist_validate = mnist_validate.map(lambda image, label: encode(image, label))
mnist_test = mnist_test.map(lambda image, label: encode(image, label))

## A second look at our data.

While the images just got normalized, the labels now have a so called one-hot encoding. There is empirical (but no mathematical proof) that normalizing (or standardizing) input data improves training in Deep Learning. One-hot encodings are nice. They encode our labels as a perfect probability distribution over our classes, in our case digits. Many zeros, only one one.

In [None]:
index = 1
plt.figure(figsize=(20, 2))
for mnist_example in mnist_train.take(6):
    image, label = mnist_example

    plt.subplot(1, 6, index)
    plt.imshow(image.numpy()[:, :, 0], cmap=plt.get_cmap("gray"))
    plt.title("Label:\n {}".format(label.numpy()))
    index += 1
plt.show()
plt.close()

## Create a Deep Neural Network to solve our classification problem - Multi-Layer Perceptron.

Not that the data pipeline is up and running, we can create our Neural Network. We will use a Multi-Layer perceptron. It is a rather traditional but reliable ANN. In its essence, it is a fully connected network with multiple Hidden layers.

One way to create Neural Networks in TensorFlow 2 is the so called Sequential API. Sequential models have one input, one output and multiple hidden layers. If you need more, you could also use the Functional API or Mode subclassing. We will discuss this in the future.

Note below, that input and output sizes are determined by the data. Hidden layers are variable both in amount and sizes.

In [None]:
model = models.Sequential()
model.add(layers.Flatten(input_shape=(28, 28, 1)))
model.add(layers.Dense(512, activation="relu"))
model.add(layers.Dense(10, activation="softmax"))
model.summary()

The Flatten layer reshapes the 2D images into 1D array. This is reversible, no information is lost.

Dense layers are fully connected layers. The fist layer has ReLU (Rectified Linear Unit) as a so called activation function. We will talk about activation functions later.

The final layer has a softmax activation function. This function makes sure that all 10 numbers in the output are between 0.0 and 1.0, while their sum is 1.0. This basically ensures that the output is a probability distribution. Remember: Our labels are one-hot encoding, perfect probability distributions.

## Attach optimizer, loss, and metrics.

In order to train an ANN, we need a couple of components:

- The optimizer is the training algorithm that changes the trainable parameters of an ANN in order to solve the given problem. We use RMSprop. RMSprop is always a good start.
- The loss function computes the error between the outputs predicted by the ANN and the outputs we expect in supervised learning. Here we use Categorical Crossentropy. It is great at computing the error between two probability distributions. The optimizer uses the loss function in order to optimize the Neural Network.
- Metrics are optional. They can be considered losses that are only used by humans. The optimizer does not consider metrics. Here we use accuracy. It tells us how many predictions are right. Works like a charm for classifiers.

In the future we will talk about optimizers and losses in more detail.

In [None]:
model.compile(
    optimizer="rmsprop",
    loss="categorical_crossentropy",
    metrics=["accuracy"]
)

## How good is our ANN before training?

Before training, we should have a look at how good our Neural Network is. Note that we do not expect our randomly initialized Neural Network to be any good.

Let us consider a single sample first to get an idea.

In [None]:
mnist_example = list(mnist_test.take(1).as_numpy_iterator())[0]

image, label = mnist_example
prediction = model.predict(np.array([image]))
plt.plot(prediction[0], label="Prediction")
plt.plot(label, label="Expectation")
plt.legend()
plt.show()
plt.close()

Here we see that our prediction (blue) is not anywhere close to what whe expect (orange). Ideally both plots would be identical. The "difference" between blue and orange is our loss.

It would be interesting to see how good the Neural Network is when it comes to the entire dataset. In order to find out, we evaluate it.

In [None]:
loss, acc = model.evaluate(mnist_test.batch(32), verbose=0)
print("Loss: {}".format(loss))
print("Accuracy: {}".format(acc))

If we just focus on accuracy, we see that it is around 10% (+-). It tells us that our Neural Network is right in about that percentage of all predictions on the test set. A random baseline would be 10%. The accuracy is really bad.

## ANN training.

Now we know how good (bad) our Neural Network is. We have everything ready. Including the data, network architecture, the optimizer, and the loss.

We can train.

In [None]:
history = model.fit(
    mnist_train.batch(128),
    epochs=5,
    validation_data=mnist_validate.batch(128)
)

We train for 5 epochs. This means that we will apply the optimizer on the entire training dataset five times. For each epoch we use a batch size of 128. Be do not train on a single image-label-pair per step. Instead we speed up training by training on multiple samples in parallel.

Also we will use the validation data, to check how good our Neural Network is on data that is not in the training set.

## Inspect the history.

The above log is quite nice. In your future projects it will be rather unreadable. It is a best-practice to visualize the training. A picture is way more expressive than the output above.

In [None]:
plt.figure(figsize=(10, 4))

plt.subplot(1, 2, 1)
plt.plot(history.history["loss"], label="loss")
plt.plot(history.history["val_loss"], label="val_loss")
plt.legend()
plt.title("Losses")

plt.subplot(1, 2, 2)
plt.plot(history.history["accuracy"], label="accuracy")
plt.plot(history.history["val_accuracy"], label="val_accuracy")
plt.legend()
plt.title("Metrics")

plt.show()
plt.close()

What do we see here? Our optimizer manages to get the loss down. It is rather natural that the loss goes down for the training set. The optimizer directly works with that data. But that the loss also goes down for the validation set is something that we want to achieve:

In summary:
- The training loss shows us how well the Neural Network is capable of solving the problem, how well it optimizes.
- The validation loss shows us how well the Neural Network is capable of working with data it has not been trained on, how well in generalizes.

Our accuracy, shows a similar picture. It goes up for both our dataset. For the validation set it almost reaches 98%. Sweet success!

How to read such and similar plots will be a future topic.

## How good is our ANN after training?

After training we will do the same excercise that we did before training.

How good is our Neural Network with one sample?

In [None]:
mnist_example = list(mnist_test.take(1).as_numpy_iterator())[0]

image, label = mnist_example
prediction = model.predict(np.array([image]))
plt.plot(prediction[0], label="Prediction")
plt.plot(label, label="Expectation")
plt.legend()
plt.show()
plt.close()

Very good! We do not see the blue plot because it is hidden behind the orange one. Note that the sample came from the test stet. Thus the Neural Network never saw it during training.

Finally, we evaluate the Neural Network using the entire test set.

In [None]:
loss, acc = model.evaluate(mnist_test.batch(32), verbose=0)
print("Loss: {}".format(loss))
print("Accuracy: {}".format(acc))

You see! Now the numbers are way better. The loss is down and the accuracy is up. Well done!

## Save the model.

After training you can save your model and deploy it. And if you like, write your customer and invoice or publish a research paper.

In [None]:
model.save("model.h5")

# Summary.

The above code is a template for all your Deep Learning projects. You will have all the components. You might have a different way of data preprocessing, different hyperparameters and Neural Network architectures. But always you will have the same pattern.

Thank you so much for learning. Keep an eye out for more. And if there is a topic that I did not cover, let me know!