# Homework

1. build the MLP in the image above using PT built-ins
2. Provide calculation for the exact number of parameters of the MLP
   - Do it first supposing that the layers don't have a bias term, then supposing that the bias is present wherever it's possible
3. Calculate the $L_1$ (vectorial) norm and the Frobenius norm for the params of each layer
4. Given 10 random datapoints, feed them into the network. This operation must be done all in one single command and must **not** make use of loops.
   - Given the output of the network, using PyTorch code, find the class of assignment of each datapoint. This also must be done in a single PyTorch command without using loops.
   - Drafting a vector of ground truths (whichever labels you like), provide code for the calculation of the accuracy
     - Tip: first get the number of correct assignments, then...

## Recall
Let us suppose we wish to build a larger model from the graph below.

![](imgs/01/mlp_graph_larger.jpg)

We suppose that

1. The layers have no bias units
2. The activation function for hidden layers is `ReLU`

    $ \text{ReLU}(x) = \max(0, x)$

Moreover, we suppose that this is a classification problem.

As you might recall, when the number of classes is > 2, we encode the problem in such a way that the output layer has a no. of neurons corresponding to the no. of classes. Doing so, we establish a correspondence between output units and classes. The value of the $j$-th neuron represents the **confidence** of the network in assigning a given data instance to the $j$-th class.

Classically, when the network is encoded in such way, the activation function for the final layer is the **softmax** function.
If $C$ is the total number of classes,

$softmax(z_j) = \frac{\exp(z_j)}{\sum_{k=1}^C \exp(z_k)}$

where $j\in \{1,\cdots,C\}$ is one of the classes.

If we repeat this calculation for all $j$ s, we end up with $C$ normalized values (i.e., between 0 and 1) which can be interpreted as a confidence that the network has in assigning the instance to the corresponding class.

## Exercise 1

In [3]:
import torch
import numpy as np

In [4]:
class MLP(torch.nn.Module):
    def __init__(self, my_bias=False):
        super().__init__()
        self.layers = torch.nn.Sequential(
            torch.nn.Linear(5, 11, bias=my_bias),
            torch.nn.ReLU(),
            torch.nn.Linear(11, 16, bias=my_bias),
            torch.nn.ReLU(),
            torch.nn.Linear(16, 13, bias=my_bias),
            torch.nn.ReLU(),
            torch.nn.Linear(13, 8, bias=my_bias),
            torch.nn.ReLU(),
            torch.nn.Linear(8, 4, bias=my_bias),
            torch.nn.Softmax(dim=1)
        )

    def forward(self, X):
        return self.layers(X)

**OSS.**
CLASS torch.nn.Softmax(dim=None):

Applies the Softmax function to an n-dimensional input Tensor rescaling them so that the elements of the n-dimensional output Tensor lie in the range [0,1] and sum to 1.



Now test if MPL class works by instanciating one:

In [5]:
model = MLP()
model # print

MLP(
  (layers): Sequential(
    (0): Linear(in_features=5, out_features=11, bias=False)
    (1): ReLU()
    (2): Linear(in_features=11, out_features=16, bias=False)
    (3): ReLU()
    (4): Linear(in_features=16, out_features=13, bias=False)
    (5): ReLU()
    (6): Linear(in_features=13, out_features=8, bias=False)
    (7): ReLU()
    (8): Linear(in_features=8, out_features=4, bias=False)
    (9): Softmax(dim=1)
  )
)

## Exercise 2

### Part 1

In a simple neural network model with no bias, if there are $i$ number of input variables, $o$ number of output variables and one hidden layer with $h$ neurons this conditions holds:

\begin{equation}
\# parameters = i * h + h * o
\end{equation}

Hence, in our model with 4 hidden layers and with $i=5$, $o=4$, $h_1=11$, $h_2=16$, $h_3=13$ and $h_4=8$ we have:

\begin{equation}
\# parameters = i * h_1 + h_1 * h_2 + h_2 * h_3 * h_3 * h_4 * o = 575
\end{equation}

We can check it by means of torchinfo:

In [6]:
import torch
from torchinfo import summary

In [7]:
summary(model)

Layer (type:depth-idx)                   Param #
MLP                                      --
├─Sequential: 1-1                        --
│    └─Linear: 2-1                       55
│    └─ReLU: 2-2                         --
│    └─Linear: 2-3                       176
│    └─ReLU: 2-4                         --
│    └─Linear: 2-5                       208
│    └─ReLU: 2-6                         --
│    └─Linear: 2-7                       104
│    └─ReLU: 2-8                         --
│    └─Linear: 2-9                       32
│    └─Softmax: 2-10                     --
Total params: 575
Trainable params: 575
Non-trainable params: 0

### Part 2

In a simple neural network model **with bias** (in the output and hidden layers) and if there are $i$ number of input variables, $o$ number of output variables and one hidden layer with $h$ neurons this conditions holds:

\begin{equation}
\# parameters = i * h + h + h * o + o
\end{equation}

Hence, in our model with 4 hidden layers and with $i=5$, $o=4$, $h_1=11$, $h_2=16$, $h_3=13$ and $h_4=8$ we have:

\begin{equation}
\# parameters = i * h_1 + h_1 * h_2 + h_2 * h_3 * h_3 * h_4 * o  + h_1 + h_2 + h_3 + h_4 + o = 627
\end{equation}

We can check it by means of torchinfo:

In [8]:
model_with_bias = MLP(True)
model_with_bias # print

MLP(
  (layers): Sequential(
    (0): Linear(in_features=5, out_features=11, bias=True)
    (1): ReLU()
    (2): Linear(in_features=11, out_features=16, bias=True)
    (3): ReLU()
    (4): Linear(in_features=16, out_features=13, bias=True)
    (5): ReLU()
    (6): Linear(in_features=13, out_features=8, bias=True)
    (7): ReLU()
    (8): Linear(in_features=8, out_features=4, bias=True)
    (9): Softmax(dim=1)
  )
)

In [9]:
summary(model_with_bias)

Layer (type:depth-idx)                   Param #
MLP                                      --
├─Sequential: 1-1                        --
│    └─Linear: 2-1                       66
│    └─ReLU: 2-2                         --
│    └─Linear: 2-3                       192
│    └─ReLU: 2-4                         --
│    └─Linear: 2-5                       221
│    └─ReLU: 2-6                         --
│    └─Linear: 2-7                       112
│    └─ReLU: 2-8                         --
│    └─Linear: 2-9                       36
│    └─Softmax: 2-10                     --
Total params: 627
Trainable params: 627
Non-trainable params: 0

## Exercise 3

The following code produces the norm of the weights for each layer:

In [15]:
for param_name, param in model.state_dict().items(): 
    print("L1 norm of", param_name, ": ", param.norm(p=1).item()) 
    # or equivalently:
    # print("L1 norm of", param_name, ": ", torch.norm(param, p=1, dim=1))
    print("Frobenius norm of", param_name, ": ", torch.norm(param, p='fro').item())
    print("\n")
    #print(param_name, param)

L1 norm of layers.0.weight :  12.258026123046875
Frobenius norm of layers.0.weight :  1.8329733610153198


L1 norm of layers.2.weight :  25.708946228027344
Frobenius norm of layers.2.weight :  2.22601056098938


L1 norm of layers.4.weight :  27.272235870361328
Frobenius norm of layers.4.weight :  2.1478898525238037


L1 norm of layers.6.weight :  13.839408874511719
Frobenius norm of layers.6.weight :  1.5872621536254883


L1 norm of layers.8.weight :  6.279640197753906
Frobenius norm of layers.8.weight :  1.2583156824111938




## Exercise 4

### Feed 10 datapoints to the network

In [None]:
X = torch.randn(10, 5) #Returns a tensor filled with random numbers from a normal distribution with mean 0 and variance 1
X

In [None]:
conf_y_hat = model(X)
conf_y_hat

### Part 1

In [None]:
y_hat = torch.argmax(conf_y_hat, dim=1)
y_hat

### Part 2

Get ground truths vector:

In [None]:
y = torch.randint(4, (10, ))
#y = torch.Tensor([2,2,2,2,2,2,2,2,2,2])
y

Evaluate di accuracy:

In [None]:
def accuracy(y, y_hat):
    correctly_classified = torch.sum(y_hat == y).item()
    #print(correctly_classified)
    #correctly_classified = (y_hat == y).sum()
    return (correctly_classified / y.shape[0])

In [None]:
accuracy(y, y_hat)