<a href="https://colab.research.google.com/github/wiso/TutorialML-AtlasItalia2022/blob/main/notebooks/1.0-ImageClassification.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 1.0 Image Classification
The goal of this tutorial is to familiarize with simple feed-forward neural networks. The goal is to classify images in one of the ten possible classes.

The classification is based on a neural network. The first step is to flatten the input gray scale image (28x28) into a 1D array. In this way we can feed this input to a first layer of a neural network.

The parameter of the neural network will tuned to minimize the loss, in this case the cross-entropy.

The steps will be:

   * load and preprocess the inputs, split in train and test dataset
   * define the model
   * define the loss
   * train with the training dataset
   * evaluate the performance with the test dataset

In [None]:
import numpy as np
import tensorflow as tf
from matplotlib import pyplot as plt
import seaborn as sns

## Load the dataset


> Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms. Han Xiao, Kashif Rasul, Roland Vollgraf. arXiv:1708.07747

Images are 2D array 28x28, in grayscale. Each image is associated to a label, definig the class. There are 10 classes.

In [None]:
fashion_mnist = tf.keras.datasets.fashion_mnist

(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()
class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
               'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']
nclasses = len(class_names)

# summarize loaded dataset
print('Train: X=%s, y=%s' % (train_images.shape, train_labels.shape))
print('Test: X=%s, y=%s' % (test_images.shape, test_labels.shape))
print("unique train labels=%s" % np.unique(train_labels))
print("range values first train img = %s, %s" % (train_images[0].min(), train_images[0].max()))

### Preprocess
Normalize the images since each pixel is a number between 0 and 1.

In [None]:
train_images = train_images / 255.
test_images = test_images / 255.

## Example


In [None]:
#@title
plt.imshow(train_images[0], cmap='binary')
plt.axis('off')
plt.show()

### Check the frequencies of the labels
The dataset is balanced

In [None]:
#@title
fig, ax = plt.subplots()
ax.hist(train_labels, bins=np.arange(nclasses + 1), label='train')
ax.hist(test_labels, bins=np.arange(nclasses + 1), label='test')
ax.set_xticks(np.arange(nclasses) + 0.5)
ax.set_xticklabels(class_names, rotation=90)
ax.legend()
plt.show()

### Display examples

In [None]:
#@title
fig, axs = plt.subplots(10, 20, figsize=(13, 7), gridspec_kw=dict(wspace=0.0, hspace=0.0,
                        top=1. - 0.5 / (10 + 1), bottom=0.5 / (10 + 1),
                        left=0.5 / (20 + 1), right=1 - 0.5 / (20 + 1)),)
for iclass, ax_row in enumerate(axs):
    imgs = train_images[train_labels == iclass][:20]
    label = class_names[iclass]
    for ax, img in zip(ax_row, imgs):
        ax.set_xticks([])
        ax.set_yticks([])
        ax.imshow(img, cmap='binary')
        ax.axis('equal')
        if ax.get_subplotspec().colspan.start == 0:
            ax.set_ylabel(label, fontsize=12)
plt.show()

## Create the model
### Using the functional API
Note that here we are just defining the computational graph representing the model. Nothing is computed here.

In [None]:
inputs = tf.keras.Input(shape=(28, 28), name='img')    # the input placeholder
x = tf.keras.layers.Flatten()(inputs)                  # flatten to 1D array
x = tf.keras.layers.Dense(128, activation='relu')(x)   # first dense layer + relu
x = tf.keras.layers.Dense(32, activation='relu')(x)
x = tf.keras.layers.Dense(nclasses)(x)                 # latest dense, no activation
x = tf.keras.layers.Softmax()(x)                       # normalize the output

model = tf.keras.Model(inputs=inputs, outputs=x)

### Inspect the model
Note that the each output has an additional dimension with length `None`. This is the batch dimension, since we will feed the model with a batch of images. This very simple model has ~200k parameters to be optimized.

In [None]:
model.summary()

In [None]:
from keras.utils.vis_utils import plot_model
plot_model(model, show_shapes=True)

### Apply the model to one image
Note that we need to expand the dimensions of the image. Also the output has one additional dimension.

In [None]:
example_image = train_images[0]
print("image shape: ", example_image.shape)
example_image = np.expand_dims(example_image, axis=0)
print("image shape: ", example_image.shape)
model(example_image)

### Compile the model
Add a loss and the optimizer. In TF these components are inside the same computational graph of the model.

In [None]:
model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])

## Train
Train for 10 epochs (the number of times the dataset is read). Use 1/3 of the train sample to check the loss and the metrics during the training. In particular use the validation sample for the early stopping.

In [None]:
history = model.fit(train_images, train_labels,
                    batch_size=256,
                    epochs=20, validation_split=0.33,
                    callbacks=[tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=3)])

### Display the metric and the loss

In [None]:
#@title
fig, axs = plt.subplots(1, 2, figsize=(16, 5), sharex=True)
for ax, quantity in zip(axs, ('accuracy', 'loss')):
    ax.plot(history.history[quantity], label='train')
    ax.plot(history.history[f'val_{quantity}'], label='validation')
    ax.legend()
    ax.set_xlabel('epoch', fontsize=15)
    ax.set_ylabel(quantity, fontsize=15)

## Run on the test sample

In [None]:
predictions = model.predict(test_images)
predictions[0]

In [None]:
#@title
fig, axs = plt.subplots(5, 5, figsize=(7,7))
for ax, img, prediction, label in zip(axs.flat, test_images, predictions, test_labels):
    ax.set_xticks([])
    ax.set_yticks([])
    ax.imshow(img, cmap='binary')
    predicted_class_id = np.argmax(prediction)
    ax.set_title("%s (%.0f%%)\n truth:%s" % (class_names[predicted_class_id], prediction[predicted_class_id] * 100, class_names[label]),
                 color='black' if predicted_class_id == label else "red")
    
plt.tight_layout()
plt.show()

## Analyze the performances
As first step compute the confusion matrix, defined as the frequencies for each true label and classified label:

Confusion matrix = $N[\text{true-label}, \text{reco-label}]$

In [None]:
#@title
confusion_matrix = tf.math.confusion_matrix(
    test_labels,
    np.argmax(predictions, axis=1),
    num_classes=10,
).numpy()

with plt.style.context('seaborn-talk'):
    fig, ax = plt.subplots(figsize=(10, 10))
    sns.heatmap(confusion_matrix, xticklabels=class_names, annot=True, yticklabels=class_names, ax=ax, square=True, linewidths=0.1, fmt='d', cbar=False)
    ax.set_xlabel('Prediction', fontsize=15)
    ax.set_ylabel('Truth', fontsize=15)

Compute the purity and the efficiency:

efficiecy = $P[\text{prediction}|\text{truth}]=N[\text{truth},\text{prediction}] / N[\text{truth}]$

purity = $P[\text{truth}|\text{prediction}]=N[\text{truth},\text{prediction}] / N[\text{prediction}]$

In [None]:
#@title
efficiency = confusion_matrix / np.sum(confusion_matrix, axis=1)  # divide by the truth
purity = (confusion_matrix.T / np.sum(confusion_matrix, axis=0)).T  # divide by the reco

with plt.style.context('seaborn-talk'):
    fig, axs = plt.subplots(1, 2, figsize=(15, 7))
    sns.heatmap(efficiency * 100, xticklabels=class_names, yticklabels=class_names, ax=axs[0], square=True, linewidths=0.1, annot=True, cmap='Blues', cbar=False)
    sns.heatmap(purity * 100, xticklabels=class_names, yticklabels=class_names, ax=axs[1], square=True, linewidths=0.1, annot=True, cmap='Reds', cbar=False)

    for ax in axs:
        ax.set_xlabel('Prediction')
        ax.set_ylabel('Truth')
    axs[0].set_title('efficiecy = P[prediction|truth]', fontsize=15)
    axs[1].set_title('purity = P[truth|prediction]', fontsize=15)
    fig.subplots_adjust(wspace=0.5)
    plt.show()

In machine learning you may encounter several new term, but they are basically the purity and the efficiency

In [None]:
#Get the predictions for the test data
predicted_classes = np.argmax(predictions, axis=1)
correct = np.nonzero(predicted_classes==test_labels)[0]
incorrect = np.nonzero(predicted_classes!=test_labels)[0]
from sklearn.metrics import classification_report
print(classification_report(test_labels, predicted_classes, target_names=class_names))