# Homework

1. build the MLP in the image above using PT built-ins
2. Provide calculation for the exact number of parameters of the MLP
   - Do it first supposing that the layers don't have a bias term, then supposing that the bias is present wherever it's possible
3. Calculate the $L_1$ (vectorial) norm and the Frobenius norm for the params of each layer
4. Given 10 random datapoints, feed them into the network. This operation must be done all in one single command and must **not** make use of loops.
   - Given the output of the network, using PyTorch code, find the class of assignment of each datapoint. This also must be done in a single PyTorch command without using loops.
   - Drafting a vector of ground truths (whichever labels you like), provide code for the calculation of the accuracy
     - Tip: first get the number of correct assignments, then...

## Recall
Let us suppose we wish to build a larger model from the graph below.

![](imgs/01/mlp_graph_larger.jpg)

We suppose that

1. The layers have no bias units
2. The activation function for hidden layers is `ReLU`

    $ \text{ReLU}(x) = \max(0, x)$

Moreover, we suppose that this is a classification problem.

As you might recall, when the number of classes is > 2, we encode the problem in such a way that the output layer has a no. of neurons corresponding to the no. of classes. Doing so, we establish a correspondence between output units and classes. The value of the $j$-th neuron represents the **confidence** of the network in assigning a given data instance to the $j$-th class.

Classically, when the network is encoded in such way, the activation function for the final layer is the **softmax** function.
If $C$ is the total number of classes,

$softmax(z_j) = \frac{\exp(z_j)}{\sum_{k=1}^C \exp(z_k)}$

where $j\in \{1,\cdots,C\}$ is one of the classes.

If we repeat this calculation for all $j$ s, we end up with $C$ normalized values (i.e., between 0 and 1) which can be interpreted as a confidence that the network has in assigning the instance to the corresponding class.

## Exercise 1

In [2]:
import torch
import numpy as np

In [3]:
class MLP(torch.nn.Module):
    def __init__(self, my_bias=False):
        super().__init__()
        self.layers = torch.nn.Sequential(
            torch.nn.Linear(5, 11, bias=my_bias),
            torch.nn.ReLU(),
            torch.nn.Linear(11, 16, bias=my_bias),
            torch.nn.ReLU(),
            torch.nn.Linear(16, 13, bias=my_bias),
            torch.nn.ReLU(),
            torch.nn.Linear(13, 8, bias=my_bias),
            torch.nn.ReLU(),
            torch.nn.Linear(8, 4, bias=my_bias),
            torch.nn.Softmax(dim=1)
        )

    def forward(self, X):
        return self.layers(X)

**OSS.**
CLASS torch.nn.Softmax(dim=None):

Applies the Softmax function to an n-dimensional input Tensor rescaling them so that the elements of the n-dimensional output Tensor lie in the range [0,1] and sum to 1.



Now test if MPL class works by instanciating one:

In [4]:
model = MLP()
model # print

MLP(
  (layers): Sequential(
    (0): Linear(in_features=5, out_features=11, bias=False)
    (1): ReLU()
    (2): Linear(in_features=11, out_features=16, bias=False)
    (3): ReLU()
    (4): Linear(in_features=16, out_features=13, bias=False)
    (5): ReLU()
    (6): Linear(in_features=13, out_features=8, bias=False)
    (7): ReLU()
    (8): Linear(in_features=8, out_features=4, bias=False)
    (9): Softmax(dim=1)
  )
)

## Exercise 2

### Part 1

In a simple neural network model with no bias, if there are $i$ number of input variables, $o$ number of output variables and one hidden layer with $h$ neurons this conditions holds:

\begin{equation}
\# parameters = i * h + h * o
\end{equation}

Hence, in our model with 4 hidden layers and with $i=5$, $o=4$, $h_1=11$, $h_2=16$, $h_3=13$ and $h_4=8$ we have:

\begin{equation}
\# parameters = i * h_1 + h_1 * h_2 + h_2 * h_3 * h_3 * h_4 * o = 575
\end{equation}

We can check it by means of torchinfo:

In [5]:
import torch
from torchinfo import summary

In [6]:
summary(model)

Layer (type:depth-idx)                   Param #
MLP                                      --
├─Sequential: 1-1                        --
│    └─Linear: 2-1                       55
│    └─ReLU: 2-2                         --
│    └─Linear: 2-3                       176
│    └─ReLU: 2-4                         --
│    └─Linear: 2-5                       208
│    └─ReLU: 2-6                         --
│    └─Linear: 2-7                       104
│    └─ReLU: 2-8                         --
│    └─Linear: 2-9                       32
│    └─Softmax: 2-10                     --
Total params: 575
Trainable params: 575
Non-trainable params: 0

### Part 2

In a simple neural network model **with bias** (in the output and hidden layers) and if there are $i$ number of input variables, $o$ number of output variables and one hidden layer with $h$ neurons this conditions holds:

\begin{equation}
\# parameters = i * h + h + h * o + o
\end{equation}

Hence, in our model with 4 hidden layers and with $i=5$, $o=4$, $h_1=11$, $h_2=16$, $h_3=13$ and $h_4=8$ we have:

\begin{equation}
\# parameters = i * h_1 + h_1 * h_2 + h_2 * h_3 * h_3 * h_4 * o  + h_1 + h_2 + h_3 + h_4 + o = 627
\end{equation}

We can check it by means of torchinfo:

In [7]:
model_with_bias = MLP(True)
model_with_bias # print

MLP(
  (layers): Sequential(
    (0): Linear(in_features=5, out_features=11, bias=True)
    (1): ReLU()
    (2): Linear(in_features=11, out_features=16, bias=True)
    (3): ReLU()
    (4): Linear(in_features=16, out_features=13, bias=True)
    (5): ReLU()
    (6): Linear(in_features=13, out_features=8, bias=True)
    (7): ReLU()
    (8): Linear(in_features=8, out_features=4, bias=True)
    (9): Softmax(dim=1)
  )
)

In [8]:
summary(model_with_bias)

Layer (type:depth-idx)                   Param #
MLP                                      --
├─Sequential: 1-1                        --
│    └─Linear: 2-1                       66
│    └─ReLU: 2-2                         --
│    └─Linear: 2-3                       192
│    └─ReLU: 2-4                         --
│    └─Linear: 2-5                       221
│    └─ReLU: 2-6                         --
│    └─Linear: 2-7                       112
│    └─ReLU: 2-8                         --
│    └─Linear: 2-9                       36
│    └─Softmax: 2-10                     --
Total params: 627
Trainable params: 627
Non-trainable params: 0

## Exercise 3

The following code produces the norm of each specific parameter $w_i$ in each layer:

In [24]:
for param_name, param in model.state_dict().items(): 
    print("L1 norm of", param_name, ": ", torch.norm(param, p=1, dim=1)) 
    print("Frobenius norm of", param_name, ": ", torch.norm(param, p='fro', dim=1))
    print("\n")
    #print(param_name, param)

L1 norm of layers.0.weight :  tensor([1.3706, 1.3095, 1.2076, 1.3152, 1.0968, 0.7321, 0.8721, 1.1712, 1.3270,
        0.8822, 1.0065])
Frobenius norm of layers.0.weight :  tensor([0.6171, 0.6631, 0.5712, 0.6768, 0.5667, 0.3742, 0.4977, 0.5935, 0.7378,
        0.4492, 0.5604])


L1 norm of layers.2.weight :  tensor([1.8714, 1.6192, 1.8858, 1.5711, 2.1190, 1.8845, 1.6313, 1.3960, 1.5442,
        2.1096, 2.0135, 1.3465, 1.6481, 1.3771, 1.4941, 1.8197])
Frobenius norm of layers.2.weight :  tensor([0.6108, 0.5374, 0.6274, 0.5245, 0.6992, 0.6431, 0.5566, 0.5140, 0.5059,
        0.6775, 0.6724, 0.4786, 0.5861, 0.5034, 0.5197, 0.6034])


L1 norm of layers.4.weight :  tensor([2.1936, 1.7309, 2.1227, 2.3824, 2.1517, 2.3157, 1.6894, 2.4102, 1.9114,
        2.2467, 1.5643, 2.1334, 2.1701])
Frobenius norm of layers.4.weight :  tensor([0.6351, 0.5149, 0.6045, 0.6591, 0.5974, 0.6351, 0.5048, 0.6733, 0.5452,
        0.6209, 0.4551, 0.6247, 0.6435])


L1 norm of layers.6.weight :  tensor([2.1437, 1.404

## Exercise 4

### Feed 10 datapoints to the network

In [10]:
x1 = torch.arange(0, 10).float().unsqueeze(-1)
x2 = torch.arange(0, 10).float().unsqueeze(-1)
x3 = torch.arange(0, 10).float().unsqueeze(-1)
x4 = torch.arange(0, 10).float().unsqueeze(-1)
x5 = torch.arange(0, 10).float().unsqueeze(-1)

# Concatenate inputs:
X = torch.cat((x1, x2, x3, x4, x5), dim=1)
# Add noise
eps = torch.normal(0, .8, (10, 5))
X += (eps)
X

tensor([[ 0.5287,  0.1404,  1.0294,  0.6824, -1.9075],
        [ 2.2157, -0.1230,  0.5900,  1.1393,  2.0579],
        [ 1.7383,  2.6700,  2.2156,  2.4686,  1.4911],
        [ 3.2361,  2.3077,  3.8812,  4.9305,  3.1369],
        [ 3.9183,  3.5097,  1.9303,  4.8616,  2.9701],
        [ 4.1286,  6.7141,  5.0085,  3.9030,  4.7227],
        [ 5.8071,  5.6323,  5.8233,  6.4185,  5.4331],
        [ 7.3633,  6.3232,  6.5968,  6.3240,  7.8792],
        [ 7.6105,  8.0370,  8.7238,  6.5124, 10.0162],
        [ 8.4191,  8.3492,  9.6653,  9.5996,  9.7586]])

In [11]:
conf_y_hat = model(X)
conf_y_hat

tensor([[0.2481, 0.2497, 0.2506, 0.2515],
        [0.2423, 0.2516, 0.2499, 0.2563],
        [0.2472, 0.2516, 0.2507, 0.2504],
        [0.2416, 0.2503, 0.2576, 0.2505],
        [0.2434, 0.2504, 0.2557, 0.2504],
        [0.2448, 0.2554, 0.2474, 0.2524],
        [0.2405, 0.2545, 0.2548, 0.2502],
        [0.2376, 0.2571, 0.2531, 0.2522],
        [0.2362, 0.2597, 0.2509, 0.2532],
        [0.2335, 0.2573, 0.2586, 0.2506]], grad_fn=<SoftmaxBackward0>)

### Part 1

In [12]:
y_hat = torch.argmax(conf_y_hat, dim=1)
y_hat

tensor([3, 3, 1, 2, 2, 1, 2, 1, 1, 2])

### Part 2

Get ground truths vector:

In [13]:
y = torch.randint(4, (10, ))
#y = torch.Tensor([2,2,2,2,2,2,2,2,2,2])
y

tensor([0, 1, 3, 0, 1, 1, 0, 0, 2, 2])

Evaluate di accuracy:

In [14]:
def accuracy(y, y_hat):
    correctly_classified = torch.sum(y_hat == y).item()
    #print(correctly_classified)
    #correctly_classified = (y_hat == y).sum()
    return (correctly_classified / y.shape[0])

In [15]:
accuracy(y, y_hat)

0.2