[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/xiptos/is_notes/blob/main/nn_fmnist.ipynb)

# Introduction

This notebook presents a simple guide to creating an artificial neural network with PyTorch. It will predict the outcome of the fashion images from the [Zalando's article images](https://github.com/zalandoresearch/fashion-mnist) (Fashion-MNIST).

The guide contains the most elementary PyTorch elements to create and evaluate a network.
 

In [None]:
import torch
from torch import nn
from torch.nn import functional as F

from torch.utils.data import DataLoader # loads data in batches
from torchvision import datasets # load Fasion-MNIST
import torchvision.transforms as T # transformers for computer vision 

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from tqdm import tqdm # progress bar

# Torchvision datasets

We obtain the [Fashion-MNIST dataset](http://pytorch.org/vision/main/generated/torchvision.datasets.FashionMNIST.html) via torchvision. The dataset contains a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 grayscale image, associated with a label from 10 classes. Note that `datasets` is an object imported from torchvision, not to confuse with from the Dataset object (used in torch.utils.data import Dataset)

```from torchvision import datasets```

When called for the first time, the datasets will be downloaded to the path specified in the `root` argument. After that, Torchvision will look first for a local copy before attempting another download.

> **torchvision.transforms**. A transformer operates on the data. Using the ' transform' argument, we can apply multiple transformations (reshape, convert to tensor, normalize, etc.) to the data obtained.

In [None]:
mytransform = T.ToTensor() # image (3D array) to Tensor

train_data = datasets.FashionMNIST(root = './', download=True, train = True, transform = mytransform)
test_data = datasets.FashionMNIST(root = './', download=True, train = False, transform = mytransform)

Note that the first image in the dataset is a 3D tensor (C, H, W) for the number of channels (C), Height (H), and Width (W).

In [None]:
img, label = train_data[0]
img.shape # returns a Tensor of Size 1,28,28

We plot the first image if we reshape the image into a 2D array (HxW).

In [None]:
# We could simply plot the tensor
plt.imshow(img.reshape(28,28), cmap = 'gist_yarg'); # gist_yarg plots inverse of W&B
plt.axis('off');

# DataLoader

The PyTorch DataLoader object allows the preparation of the dataset in batches of different sizes and shuffles them if necessary when exposing them to the training. 

```from torch.utils.data import DataLoader```

> Note that the DataLoader object shuffles the data by default.

In [None]:
torch.manual_seed(101)

train_loader = DataLoader(train_data, batch_size = 100, shuffle=True)
# the test loader can be bigger and doesn't need to be shuffled
test_loader =  DataLoader(test_data,  batch_size = 500, shuffle=False) 

If we run one iteration now, we will have one batch of the training dataset (100 images and labels).

In [None]:
# Plot 10 images
for img, label in train_loader:
    break # we run only one iteration , after that we break
img.shape # bz, ch, W H

Let's select the 50 first images of the batch to plot them.

In [None]:
myimages = img[:50].numpy() # we now obtain NumPy arrays
myimages.shape

We will need to transpose the NumPy array to plot it with matplotlib (accepts height x width matrices).

In [None]:
myimages[0].shape # channel, height, width

In [None]:
myimages[0].transpose(1,2,0).shape # height, width, channel

In [None]:
fig, ax = plt.subplots(nrows = 5, ncols = 10, figsize=(8,4), subplot_kw={'xticks': [], 'yticks': []})
for row in range(0,5):
    for col in range(0,10):
        myid = (10*row) + col # (ncols*rows) + cols
        
        ax[row,col].imshow( myimages[myid].transpose(1,2,0), cmap = 'gist_yarg' ) # W,H,C
        ax[row,col].axis('off')

# Create the network

The training set contains 60,000 records with 784 incoming features. The first layer is 784 neurons. After that, we create two fully connected layers of 120 and 84 neurons, respectively. The activation function we use is a Rectified Linear Unit (ReLU) function. This piecewise function that makes negative inputs to the neuron will be zero and scales up to positive ones.

Finally, the output layer contains ten layers. Every layer is the probability of having the label given (from 1 to 10), with the condition that the sum of the probabilities is one (Log softmax).

In [None]:
class MultilayerPerceptron(nn.Module):


In [None]:
torch.manual_seed(101)

mymodel = MultilayerPerceptron() # default params are in_features = 784, out_features=10
mymodel # check topology

* We select the [cross-entroy](https://pytorch.org/docs/stable/generated/torch.nn.functional.cross_entropy.html?highlight=entropy) as the cost function. The cross-entropy is similar to the quadratic formula, but it predicts the probability distribution of each class.

* We define the optimization method. The simplest one is the [Adaptative Stochastic Gradient Descent method](https://pytorch.org/docs/stable/generated/torch.optim.Adam.html). 

In [None]:
learning_rate = 1e-3
criterion = 

optimizer = 

How many parameters do we need to evaluate?

* Number of weights = (784 x 120) + (120 x 84) + (84 x 10) = 105,000
* Number of biases = 120 + 84 + 10 = 214

Total = 105,214

In [None]:
params = [p.numel() for p in mymodel.parameters() if p.requires_grad]
np.sum(params)

# Training and evaluation 

1. Before starting, we must consider that the DataLoader returns a tensor of size [100,1,28,28], but our model accepts 1D vectors of 784 pixels (28x28). Therefore, we must flatten the tensor to accommodate the model's input.

In [None]:
# Plot 10 images
myiter = iter(myloader.train)
img, label = myiter.__next__() # only one iteration
img.shape # batch_size, channel, Height, Width

2. We will flatten the dimensions of the batch (1,28,28) that correspond to channel, height, and width. That's a common preprocessing step when using images to allocate a 1D vector to the entry of the network (in our case, a 28 x 28 = 784 vector).

In [None]:
img.view(100,-1).shape # 100 batches of 784 pixels

Let's evaluate that batch without training the model. The prediction returns a 100 x 10 tensor. It means that we obtain ten probabilities for every batch of 100 images (shape is [100,10]).

In [None]:
y_pred = mymodel( img.view(100,-1) )
y_pred.shape # 100 x 10, meaning for every batch (100) we obtain  (10 probabilities) predictions 

If we calculate the index with the highest probability for every label outcome, we obtain:

In [None]:
val, idx = torch.max(y_pred, dim=1) # dim 1 is for the output
idx # indices == predictions

3. We will calculate the model's accuracy in every epoch (number of correct projections in/batch size) for both the train and the test dataset.

In [None]:
# tracking variables

class Loss:
    """ Class to monitor train and test lost"""
    train: list = []
    test: list = []
    

class Accuracy:
    """ Class to monitor train and test accuracy"""
    train: list = []
    test: list = []

In [None]:
%%time
# Train for 10 epocs


# Visualization

We'll see train and test losses, together with its accuracies per epoch. Note that the training data have more minor losses and reach an accuracy of almost 100%. On the other hand, the test data reach almost a plateau of > 95% accuracy, and we could think of using more than two epochs because this is where the training data crosses the accuracy of the testing data.

In [None]:
fig, ax = plt.subplots(nrows=1, ncols=2, figsize=(12,4))
ax[0].plot(Loss.train, label = 'Training')
ax[0].plot(Loss.test, label='test/validation')
ax[0].set_ylabel('Loss', fontsize=16)


ax[1].plot(Accuracy.train, label = 'Training')
ax[1].plot(Accuracy.test, label='test/validation')
ax[1].set_yticks(range(85,110,5))
ax[1].axvline(x=2, color='gray', linestyle=':')
ax[1].axhline(y=100, color='gray', linestyle=':')
ax[1].set_ylabel('Accuracy (%)', fontsize=16)

for myax in ax:
    myax.set_xlabel('Epoch', fontsize=16)
    myax.set_xticks(range(epochs))
    myax.legend(frameon=False)



Finally, we evaluate all the test data at once and visualize the accuracy of every outcome for every prediction (i.e., confusion matrix).

In [None]:
test_loader =  DataLoader(test_data,  batch_size = 10_000, shuffle=False) # the whole test is 10,000 images
myiter = iter(test_loader)
img, label = myiter.__next__()
img.shape

In [None]:
with torch.no_grad():
    correct = 0
    
    for X, y_label in test_loader:
            y_val = mymodel( X.view(X.shape[0],-1) ) # flatten
            _, predicted = torch.max( y_val, dim = 1)
            correct += (predicted == y_label).sum()

print(f'Test accuracy: = {correct.item()*100/(len(test_data)):2.4f} %')

In [None]:
# Show the confusion matrix

In [None]:
# Show the heatmap corresponding to the confusion matrix