# Simple Feed-Forward Neural Network in Pytorch

In [None]:
import os 
import pickle

import numpy as np
import torch
import torch.nn as nn

from torch.utils.data import Dataset, DataLoader

We start by importing the pickled data, so we do not have to repeat the preprocessing steps.

In [None]:
pickledatapath = os.path.join('data', 'iris_data.pkl')

with open(pickledatapath, 'rb') as f:
    x_train, x_test, y_train, y_test = pickle.load(f)

Verify that all the data have been correctly imported.

In [None]:
print("x_train shape:", x_train.shape)
print("y_train shape:", y_train.shape)
print("x_test shape:", x_test.shape)
print("y_test shape:", y_test.shape)

By default, torch initializes tensors as 32-bits floating point values.

However, it is good practice to set the default type to 64-bits floating point.

In [None]:
torch.set_default_dtype(torch.float64)

Let's set the PRNG seeds for reproducibility.

In [None]:
np.random.seed(42)
torch.manual_seed(42)

And copypaste our dataset from the previous notebook.

In [None]:
class MyIrisDataset(Dataset): 

    def __init__(self, x_data, y_data):
        super().__init__()
        assert len(x_data) == len(y_data)
        self.x_data = torch.tensor(x_data, dtype=torch.float64)
        self.y_data = torch.tensor(y_data, dtype=torch.int64)

    def __len__(self):
        return len(self.x_data)

    def __getitem__(self, index):
        return self.x_data[index], self.y_data[index]

## Neural Networks in Pytorch

Finally, it's time to code an actual neural network in Pytorch!

A neural network model in Pytorch is an extension of the class `torch.nn.Module`. In order to define this class, we need to specify 2 methods:

- `__init__`: In this method, you should declare class variables like the number of features and classes. And, most importantly, you should declare the building blocks of your model. Typically, these will be one or more `Sequential` blocks.
- `forward`: Here you declare what your model does during the forward pass. This gives you a lot of flexibility in terms of what your model can do, but also leaves plenty of room for errors!





In [None]:
class MyMultiLayerPerceptron(nn.Module):

    def __init__(self, n_features, n_classes):
        super().__init__()
        self.n_features = n_features
        self.n_classes = n_classes

        self.model = nn.Sequential(
            nn.Linear(self.n_features, 5),
            nn.ReLU(),
            nn.Linear(5, self.n_classes)
        )


    def forward(self, x):
        y = self.model(x)
        return y

To initialize your first model, you just need to create an instance of the model's class.

In [None]:
model = MyMultiLayerPerceptron(n_features=4, n_classes=3)

You can print a basic summary of your model's architecture simply by using `print(model)`.

In [None]:
print(model)

Unfortunately, this simple print shows the layers of your model, but not the number of trainable parameters.

There is a Python module called `torch-summary` that provides a more extensive model summary and displays the total number of parameters per each layer. You can check it out at this link: https://pypi.org/project/torch-summary/ 

However, I am not a fan of bloating my environments with unnecessary libraries, so I will show you how to manually look for your model's trainable parameters.

This is done by calling the `model.parameters()` method.

In [None]:
model.parameters()

You may have noticed that this is a generator, meaning that we can get the parameters by making it an iterator and calling `next()` a bunch of times.

In [None]:
param_iter = iter(model.parameters())
some_params = next(param_iter)
print(some_params)

In [None]:
some_more_params = next(param_iter)
print(some_more_params)

In [None]:
print(some_params.shape)

The first extracted parameters have shape (5, 4), meaning that they are 20. But that's just one set of parameters (likely the weights of the first linear layer).

In order to get the total number of parameters, we can just multiply the shape of each set of parameters and compute the overall sum.

In [None]:
n_params = np.sum([np.prod(param.shape) for param in model.parameters()])
print("Total number of parameters:", n_params)

However, we don't really know what each set of parameters represents. Of course, we can guess that the first one that we extracted contained the weights of the first linear layer, while the second stored the bias terms.

But if we want to be sure, we can use the `model.named_parameters()` method.

In [None]:
param_dict = {name: params.detach().numpy() for name, params in model.named_parameters()}
with np.printoptions(precision=2, suppress=True):
    print(param_dict)

We can used key, value pairs stored in `model.named_parameters()` to see how many parameters each layer has.

In [None]:
for name, params in param_dict.items():
    # print(name.split("."))
    temp, layer, weight_type = name.split(".")
    layer = int(layer)
    print(f"Layer {layer} ({weight_type}): \t{np.prod(params.shape):>2} parameters")

### Some final touches before training

Okay, we have played enough with the parameters of our model. It's time to start some actual training. Almost.

There are some last steps that we need to follow before beginning the training process.

1) Prepare your training dataset and dataloader

In [None]:
train_dataset = MyIrisDataset(x_train, y_train)
train_loader = DataLoader(train_dataset, batch_size=16, shuffle=True)

2) Check that the data produced by the dataloader has the right shape

In [None]:
x_batch, y_batch = next(iter(train_loader))
print("x_batch shape:", x_batch.shape)
print("y_batch shape:", y_batch.shape)

3) Verify that the model's output looks as expected (it should be triplets of scores, since we have 3 classes)

In [None]:
pred_batch = model(x_batch)
print("Example of model's output (before training):")
print(pred_batch)

4) Define the loss function (cross-entropy) and the optimizer (Adam will be fine).

In [None]:
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)

5) Make a function to evaluate your model's accuracy. In classification problems, accuracy is defined as

\begin{equation}
\text{Accuracy} = \frac{\text{correct predictions}}{\text{total predictions}}=\frac{1}{n} \sum_{i=1}^{n} \chi\{\hat{y}^{(i)} = y^{(i)} \}
\end{equation}

I normally use $\hat{y}$ for the predicted labels, $y$ for the true labels, and $\chi$ for the indicator function, but let's not get stuck on notation. The point is that accuracy is the average number of samples that are correctly predicted by our model.

This can be computed by using `np.mean(predicted_labels == true_labels)`

In [None]:
def model_accuracy(x, y):
    x = torch.tensor(x, dtype=torch.float64) # cast to tensor
    out_tensor = model(x) # model's output scores
    out = out_tensor.detach().numpy()
    pred = np.argmax(out, axis=1)
    return np.mean(pred == y)

print("Train accuracy before training:", model_accuracy(x_train, y_train))
print("Test accuracy before training:", model_accuracy(x_test, y_test))

### Model training

Finally, time to train our model. We do that by iterating the `train_loader` for multiple "epochs".

Here's a brief summary of the steps you need to follow for each batch:

1) Reset the gradient stored in the optimizer with `optimizer.zero_grad()`. By default, your optimizer keeps the sums of all the gradients computed so far, so at each new iteration you normally want to clear it.

2) Feed the batch to the model, and obtain the model's output scores (a.k.a. logits), which should be a tensor of size `(batch_size, n_classes)` 

3) Calculate the cross-entropy loss between the logits and the true labels (check the docs to see how the cross-entropy loss works https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html)

4) Call `loss_batch.backwards()` to perform the backpropagation, and compute the gradient, which is automatically added to the optimizer

5) Use `optimizer.step()` to update the parameters based on the current gradient. In the standard SGD, this is simply
\begin{equation}
w \gets w - \eta \cdot \nabla \ell
\end{equation}
where $\nabla \ell$ is the gradient of the loss, $w$ are the model parameters, and $\eta$ is the learning rate. But in our case we are using Adam, so there are some additional adaptive momentum terms that are taken into account.

Now, let the training begin!

In [None]:
n_epochs = 1000
for _ in range(n_epochs):
    for x_batch, y_batch in train_loader:

        optimizer.zero_grad()
        scores_batch = model(x_batch)
        loss_batch = loss_fn(scores_batch, y_batch)
        loss_batch.backward()
        optimizer.step()
    

If everything went smoothly, your model should now have a decent accuracy.

IMPORTANT: We are interested in the *test accuracy*. We compute the training accuracy only to see what is the gap between the two. That can help to determine if the model is <i>overfitting</i> the training data.

In [None]:
print("Train accuracy after training:", model_accuracy(x_train, y_train))
print("Test accuracy after training:", model_accuracy(x_test, y_test))

In [None]:
param_dict = {name: params.detach().numpy() for name, params in model.named_parameters()}
with np.printoptions(precision=2, suppress=True):
    print(param_dict["model.0.weight"])

## Homework

Neural networks have various settings (or <i>hyperparameters</i>) that you can change. You may want to get acquainted with them a get an idea of what role they play in the training process.

You can start by toying with the hyperparameters of this model and seeing what happens when you change:
- Batch size
- Learning rate
- Architecture (try more layers, less layers, bigger layers, etc.)