# Homework

1. build the MLP in the image above using PT built-ins
2. Provide calculation for the exact number of parameters of the MLP
   - Do it first supposing that the layers don't have a bias term, then supposing that the bias is present wherever it's possible
3. Calculate the $L_1$ (vectorial) norm and the Frobenius norm for the params of each layer
4. Given 10 random datapoints, feed them into the network. This operation must be done all in one single command and must **not** make use of loops.
   - Given the output of the network, using PyTorch code, find the class of assignment of each datapoint. This also must be done in a single PyTorch command without using loops.
   - Drafting a vector of ground truths (whichever labels you like), provide code for the calculation of the accuracy
     - Tip: first get the number of correct assignments, then...

## Recall
Let us suppose we wish to build a larger model from the graph below.

![](imgs/01/mlp_graph_larger.jpg)

We suppose that

1. The layers have no bias units
2. The activation function for hidden layers is `ReLU`

    $ \text{ReLU}(x) = \max(0, x)$

Moreover, we suppose that this is a classification problem.

As you might recall, when the number of classes is > 2, we encode the problem in such a way that the output layer has a no. of neurons corresponding to the no. of classes. Doing so, we establish a correspondence between output units and classes. The value of the $j$-th neuron represents the **confidence** of the network in assigning a given data instance to the $j$-th class.

Classically, when the network is encoded in such way, the activation function for the final layer is the **softmax** function.
If $C$ is the total number of classes,

$softmax(z_j) = \frac{\exp(z_j)}{\sum_{k=1}^C \exp(z_k)}$

where $j\in \{1,\cdots,C\}$ is one of the classes.

If we repeat this calculation for all $j$ s, we end up with $C$ normalized values (i.e., between 0 and 1) which can be interpreted as a confidence that the network has in assigning the instance to the corresponding class.

## Exercise 1

In [8]:
import torch
import numpy as np

In [9]:
class MLP(torch.nn.Module):
    def __init__(self, my_bias=False):
        super().__init__()
        self.layers = torch.nn.Sequential(
            torch.nn.Linear(5, 11, bias=my_bias),
            torch.nn.ReLU(),
            torch.nn.Linear(11, 16, bias=my_bias),
            torch.nn.ReLU(),
            torch.nn.Linear(16, 13, bias=my_bias),
            torch.nn.ReLU(),
            torch.nn.Linear(13, 8, bias=my_bias),
            torch.nn.ReLU(),
            torch.nn.Linear(8, 4, bias=my_bias),
            torch.nn.Softmax(dim=1)
        )

    def forward(self, X):
        return self.layers(X)

**OSS.**
CLASS torch.nn.Softmax(dim=None):

Applies the Softmax function to an n-dimensional input Tensor rescaling them so that the elements of the n-dimensional output Tensor lie in the range [0,1] and sum to 1.



Now test if MPL class works by instanciating one:

In [10]:
model = MLP()
model # print

MLP(
  (layers): Sequential(
    (0): Linear(in_features=5, out_features=11, bias=False)
    (1): ReLU()
    (2): Linear(in_features=11, out_features=16, bias=False)
    (3): ReLU()
    (4): Linear(in_features=16, out_features=13, bias=False)
    (5): ReLU()
    (6): Linear(in_features=13, out_features=8, bias=False)
    (7): ReLU()
    (8): Linear(in_features=8, out_features=4, bias=False)
    (9): Softmax(dim=1)
  )
)

## Exercise 2

### Part 1

In a simple neural network model with no bias, if there are $i$ number of input variables, $o$ number of output variables and one hidden layer with $h$ neurons this conditions holds:

\begin{equation}
\# parameters = i * h + h * o
\end{equation}

Hence, in our model with 4 hidden layers and with $i=5$, $o=4$, $h_1=11$, $h_2=16$, $h_3=13$ and $h_4=8$ we have:

\begin{equation}
\# parameters = i * h_1 + h_1 * h_2 + h_2 * h_3 * h_3 * h_4 * o = 575
\end{equation}

We can check it by means of torchinfo:

In [11]:
import torch
from torchinfo import summary

ModuleNotFoundError: No module named 'torchinfo'

In [12]:
summary(model)

NameError: name 'summary' is not defined

### Part 2

In a simple neural network model **with bias** (in the output and hidden layers) and if there are $i$ number of input variables, $o$ number of output variables and one hidden layer with $h$ neurons this conditions holds:

\begin{equation}
\# parameters = i * h + h + h * o + o
\end{equation}

Hence, in our model with 4 hidden layers and with $i=5$, $o=4$, $h_1=11$, $h_2=16$, $h_3=13$ and $h_4=8$ we have:

\begin{equation}
\# parameters = i * h_1 + h_1 * h_2 + h_2 * h_3 * h_3 * h_4 * o  + h_1 + h_2 + h_3 + h_4 + o = 625
\end{equation}

We can check it by means of torchinfo:

In [13]:
model_with_bias = MLP(True)
model_with_bias # print

MLP(
  (layers): Sequential(
    (0): Linear(in_features=5, out_features=11, bias=True)
    (1): ReLU()
    (2): Linear(in_features=11, out_features=16, bias=True)
    (3): ReLU()
    (4): Linear(in_features=16, out_features=13, bias=True)
    (5): ReLU()
    (6): Linear(in_features=13, out_features=8, bias=True)
    (7): ReLU()
    (8): Linear(in_features=8, out_features=4, bias=True)
    (9): Softmax(dim=1)
  )
)

In [14]:
summary(model_with_bias)

NameError: name 'summary' is not defined

## Exercise 3

In [15]:
for param_name, param in model.state_dict().items(): 
    # print("L1 norm of", param_name, ": ", param.abs().sum().item()) #It's equal but takes much more time!
    print("L1 norm of", param_name, ": ", torch.norm(param, p=1)) 
    print("Frobenius norm of", param_name, ": ", torch.norm(param, p='fro').item())

L1 norm of layers.0.weight :  tensor(12.9202)
Frobenius norm of layers.0.weight :  2.032184362411499
L1 norm of layers.2.weight :  tensor(25.3650)
Frobenius norm of layers.2.weight :  2.2356810569763184
L1 norm of layers.4.weight :  tensor(24.1685)
Frobenius norm of layers.4.weight :  1.995491862297058
L1 norm of layers.6.weight :  tensor(13.3087)
Frobenius norm of layers.6.weight :  1.5592228174209595
L1 norm of layers.8.weight :  tensor(5.7601)
Frobenius norm of layers.8.weight :  1.1633930206298828


## Exercise 4

### Feed 10 datapoints to the network

In [24]:
x1 = torch.arange(0, 10).float().unsqueeze(-1)
x2 = torch.arange(0, 10).float().unsqueeze(-1)
x3 = torch.arange(0, 10).float().unsqueeze(-1)
x4 = torch.arange(0, 10).float().unsqueeze(-1)
x5 = torch.arange(0, 10).float().unsqueeze(-1)

# Concatenate inputs:
X = torch.cat((x1, x2, x3, x4, x5), dim=1)
# Add noise
eps = torch.normal(0, .8, (10, 5))
X += (eps)
X

tensor([[ 0.0966, -1.2707,  1.3402, -0.2477,  1.4666],
        [ 1.3619,  0.0706,  0.3836,  0.5542,  0.1831],
        [ 1.5612,  2.6686,  2.1895,  2.0453,  2.4906],
        [ 2.4586,  2.7203,  1.7721,  3.0752,  3.0214],
        [ 3.3467,  3.8911,  4.6990,  4.1207,  5.7116],
        [ 4.7521,  6.3401,  4.4006,  5.0135,  5.6509],
        [ 5.0632,  5.3284,  5.7606,  4.5770,  6.4677],
        [ 7.5261,  8.3475,  6.5741,  7.0730,  6.9946],
        [ 8.8613,  7.3694,  7.9885,  7.3336,  7.2765],
        [ 8.3077,  9.5862,  8.1710,  8.4525,  9.5406]])

In [25]:
conf_y_hat = model(X)
conf_y_hat

tensor([[0.2523, 0.2484, 0.2470, 0.2523],
        [0.2517, 0.2490, 0.2481, 0.2511],
        [0.2549, 0.2473, 0.2469, 0.2509],
        [0.2567, 0.2458, 0.2463, 0.2513],
        [0.2569, 0.2452, 0.2453, 0.2526],
        [0.2639, 0.2424, 0.2421, 0.2516],
        [0.2611, 0.2438, 0.2429, 0.2522],
        [0.2706, 0.2379, 0.2378, 0.2537],
        [0.2712, 0.2381, 0.2378, 0.2528],
        [0.2744, 0.2379, 0.2357, 0.2521]], grad_fn=<SoftmaxBackward>)

### Part 1

In [26]:
y_hat = torch.argmax(conf_y_hat, dim=1)
y_hat

tensor([3, 0, 0, 0, 0, 0, 0, 0, 0, 0])

### Part 2

Get ground truths vector:

In [27]:
y = torch.randint(4, (10, ))
#y = torch.Tensor([2,2,2,2,2,2,2,2,2,2])
y

tensor([2, 0, 1, 3, 2, 0, 1, 0, 2, 1])

Evaluate di accuracy:

In [28]:
def accuracy(y, y_hat):
    correctly_classified = torch.sum(y_hat == y).item()
    #print(correctly_classified)
    #correctly_classified = (y_hat == y).sum()
    return (correctly_classified / y.shape[0])

In [29]:
accuracy(y, y_hat)

0.3