# Homework

1. Complete the Python implementation of the backpropagation exercise in the **Backpropagation** section here above (cell `# try it in Python as homework!`)
    - Create the calculations for obtaining $y$ in PyTorch **using only PyTorch methods and routines**
    - Calculate the gradient
    - Check the values of the gradients and see if it is correct w.r.t. the manual calculations
2. Given the multilayer perceptron defined during the exercises from lab 1:
    - Create 10 random datapoints (with any function you wish, it can be `rand`, `randn`...) and feed them into the network
    - Given the output, calculate the Cross-Entropy loss with respect to the ground truth $[1,2,3,4,1,2,3,4,1,2]$ (classes from 1 to 4). Cross-Entropy loss:
        
        $$ CE(\mathbf{y}, \hat{\mathbf{y}}) = - \sum_{i=1}^{10} \hat{y}_i \log(y_i)$$
        
        where $y_i$ is the one-hot encoding of the $i$-th datapoint. For instance, $y_1 = [1,0,0,0]$.
        **_Note: there is an extremely handy PyTorch function for getting a one-hot encoding out of a vector, so don't try anything fancy._**
    - Backpropagate the error along the network and inspect the gradient of the parameters connecting the input layer and the first hidden layer.
3. Execute the python script `utils/randomized_backpropagation_formula.py`. This creates a formula $f(\mathbf{x})$ with randomized operators and values. Create the computational graph from this formula, do (by hand) the forward pass, then calculate (by hand) $\nabla f(\mathbf{x})$ using the backward gradient computation. Do the same calculation on PyTorch to check the correctness of your calculations. _Note: The formula created by this script is linked to your name and surname, which you have to input before_. The solution to this exercise _should_ be submitted as a scan/good quality picture of a piece of paper (or you can do it on a touch screen and submit the image...), but other formats are acceptable as well.

In [28]:
import torch

## Exercise 1

**Backpropagation**

Let us suppose we have the following calculation

$\mathbf{x} = [1,~2,~-1,~3,~5]$

$ y = f(\mathbf{x}) = \log\{[\exp (x_1 * x_2 )]^2 + \sin (x_3 + x_4 + x_5) \cdot x_5\}$

Find

$\nabla f(\mathbf{x})$

### Part 1

In [29]:
x1 = torch.tensor([1.0], requires_grad=True)
x2 = torch.tensor([2.0], requires_grad=True)
x3 = torch.tensor([-1.0], requires_grad=True)
x4 = torch.tensor([3.0], requires_grad=True)
x5 = torch.tensor([5.0], requires_grad=True)

print(x1)
print(x2)
print(x3)
print(x4)
print(x5)

# x = torch.tensor([1,2,-1,3,5], dtype=torch.float32, requires_grad=True)
# print(x)

tensor([1.], requires_grad=True)
tensor([2.], requires_grad=True)
tensor([-1.], requires_grad=True)
tensor([3.], requires_grad=True)
tensor([5.], requires_grad=True)


In [30]:
a = torch.matmul(x1, x2)
print(a)
b = x3 + x4 + x5
print(b)
c = a.exp()
print(c)
d = torch.pow(c, 2)
print(d)
g = b.sin()
print(g)
h = torch.matmul(g, x5)
print(h)
i = d + h
print(i)
y = torch.log(i)
print("\n")
print("The value of function f(x) evaluated in x is:")
print(y)

tensor(2., grad_fn=<DotBackward0>)
tensor([7.], grad_fn=<AddBackward0>)
tensor(7.3891, grad_fn=<ExpBackward0>)
tensor(54.5982, grad_fn=<PowBackward0>)
tensor([0.6570], grad_fn=<SinBackward0>)
tensor(3.2849, grad_fn=<DotBackward0>)
tensor(57.8831, grad_fn=<AddBackward0>)


The value of function f(x) evaluated in x is:
tensor(4.0584, grad_fn=<LogBackward0>)


### Part 2

In [31]:
y.backward()
print(x1.grad)
print(x2.grad)
print(x3.grad)
print(x4.grad)
print(x5.grad)

tensor([3.7730])
tensor([1.8865])
tensor([0.0651])
tensor([0.0651])
tensor([0.0765])


### Part 3

From manual calculations performed in class, we expected:

- $\partial f/\partial x_1 = 4.14$
- $\partial f/\partial x_2 = 2.07$
- $\partial f/\partial x_3 = 0.08$
- $\partial f/\partial x_4 = 0.08$
- $\partial f/\partial x_4 = 0.093$

As you can see from previous cell in Part 2, the manual calculations are slightly different since we made use of approximations in class.

## Exercise 2

This was the artificial network built in the first assignement:

In [32]:
class MLP(torch.nn.Module):
    def __init__(self, my_bias=False):
        super().__init__()
        self.layers = torch.nn.Sequential(
            torch.nn.Linear(5, 11, bias=my_bias),
            torch.nn.ReLU(),
            torch.nn.Linear(11, 16, bias=my_bias),
            torch.nn.ReLU(),
            torch.nn.Linear(16, 13, bias=my_bias),
            torch.nn.ReLU(),
            torch.nn.Linear(13, 8, bias=my_bias),
            torch.nn.ReLU(),
            torch.nn.Linear(8, 4, bias=my_bias),
            torch.nn.Softmax(dim=1)
        )

    def forward(self, X):
        return self.layers(X)

The following is an instance of it:

In [33]:
model = MLP()
model

MLP(
  (layers): Sequential(
    (0): Linear(in_features=5, out_features=11, bias=False)
    (1): ReLU()
    (2): Linear(in_features=11, out_features=16, bias=False)
    (3): ReLU()
    (4): Linear(in_features=16, out_features=13, bias=False)
    (5): ReLU()
    (6): Linear(in_features=13, out_features=8, bias=False)
    (7): ReLU()
    (8): Linear(in_features=8, out_features=4, bias=False)
    (9): Softmax(dim=1)
  )
)

### Part 1

Feed 10 datapoints to the network:

In [34]:
X = torch.randn(10, 5,requires_grad=True) #Returns a tensor filled with random numbers from a normal distribution with mean 0 and variance 1
print(X)
print("\n")
conf_y_hat = model(X)
print(conf_y_hat)

tensor([[ 1.4492,  0.3525, -1.7987,  1.9462, -0.9996],
        [-0.4771, -1.4411, -0.6659,  1.0783, -0.1875],
        [ 0.7208,  1.9495, -0.7505,  0.3738,  0.0608],
        [-0.1368,  0.5650, -1.7016,  0.8492, -0.9626],
        [ 1.1818, -0.2077,  2.1352,  1.5876,  0.4281],
        [ 0.4977, -1.1229, -0.1563,  1.2674,  2.0150],
        [-1.0608, -0.4383,  1.5408, -0.9206,  0.8729],
        [-0.2938, -0.3205, -0.6782,  0.5879,  0.2300],
        [ 0.8401,  1.5725, -0.5892,  1.2250,  2.2694],
        [ 0.1435,  1.3357,  1.5479, -0.2489,  0.4268]], requires_grad=True)


tensor([[0.2505, 0.2504, 0.2476, 0.2515],
        [0.2498, 0.2506, 0.2477, 0.2519],
        [0.2510, 0.2508, 0.2463, 0.2519],
        [0.2501, 0.2503, 0.2478, 0.2519],
        [0.2500, 0.2502, 0.2492, 0.2506],
        [0.2514, 0.2522, 0.2393, 0.2571],
        [0.2509, 0.2497, 0.2491, 0.2502],
        [0.2495, 0.2509, 0.2457, 0.2540],
        [0.2499, 0.2522, 0.2355, 0.2624],
        [0.2508, 0.2499, 0.2489, 0.2505]], grad_f

### Part 2

The predicted classes for the 10 points are then:

In [35]:
y_hat = torch.argmax(conf_y_hat, dim=1) + 1# in order to be confrontable with the ground_truth we add 1 
y_hat

tensor([4, 4, 4, 4, 4, 4, 1, 4, 4, 1])

Define the ground truth:

In [36]:
# We keep it with float point since the torch.nn.functional.one_hot function doesn't allow integers
ground_truth = torch.tensor([1,2,3,4,1,2,3,4,1,2]) # dtype=torch.int) or ground_truth.int() in case
ground_truth

tensor([1, 2, 3, 4, 1, 2, 3, 4, 1, 2])

Define the Cross-Entropy Loss:
$$ CE(\mathbf{y}, \hat{\mathbf{y}}) = - \sum_{i=1}^{10} \hat{y}_i \log(y_i)$$
        
where $y_i$ is the one-hot encoding of the $i$-th datapoint. For instance, $y_1 = [1,0,0,0]$.

**_Note: there is an extremely handy PyTorch function for getting a one-hot encoding out of a vector, so don't try anything fancy._**

In [37]:
def ce_loss(y_hat, y):
    # First we need to transform y in one-hot encoding tensor of shape 4 x 10.
    # To do it we must rescale the ground_truth values in compliance with the function torch.nn.functional.one_hot,
    # since in order to work num_classes must be greater than the biggest values among ground_truth.
    scaled_truth = y-1
    # Then we apply the function one_hot (if num_classes=-1 then num_classes=max_value in ground_truth + 1)
    # and taking the transpose for the matrix multiplication:
    y_hot = torch.nn.functional.one_hot(scaled_truth, num_classes=-1)
    loss = (-1) * ((y_hot.float() @ y_hat.T.log())).sum()
    return loss

Now let's evaluate it:

In [38]:
loss = ce_loss(conf_y_hat, ground_truth)
loss.item()

138.59600830078125

### Part 3

In [43]:
#First I appply the backward propagation of the error:
loss.backward()
#Then I access in the first layer the matrices of the 10 gradients of the loss function w.r.t. the parameters w_i for i=1,...,5
model.layers[0].weight.grad
#print(X.grad)

tensor([[ 0.0033, -0.0165, -0.0169,  0.0150, -0.0069],
        [ 0.0141,  0.0113, -0.0253,  0.0121, -0.0096],
        [-0.0260, -0.0142, -0.0038, -0.0149,  0.0076],
        [-0.0118, -0.0043,  0.0427, -0.0440,  0.0171],
        [ 0.0093,  0.0177, -0.0061,  0.0136,  0.0170],
        [-0.0347, -0.0171,  0.0485, -0.0565, -0.0110],
        [ 0.0234,  0.0063, -0.0410,  0.0288, -0.0447],
        [ 0.0063,  0.0065, -0.0051,  0.0121,  0.0324],
        [ 0.0025,  0.0011, -0.0037,  0.0022, -0.0021],
        [ 0.0213,  0.0123, -0.0244,  0.0207, -0.0264],
        [-0.0099, -0.0012,  0.0129, -0.0152, -0.0044]])

## Exercise 3

Input your name, then press ENTER: Valeria Insogna

f(X) =  exp((sin(x1 + x2) / ReLU(x3 + x4)) - x5)

Your values
{'x1': 3, 'x2': 4, 'x3': -2, 'x4': -1, 'x5': -2}

Calculate ∇f(X) [NB: if division by 0, change the value(s) of X responsible for that]

In [40]:
x_1 = torch.tensor([3.0], requires_grad=True)
x_2 = torch.tensor([4.0], requires_grad=True)
x_3 = torch.tensor([2.0], requires_grad=True) #I have change it from -2 to +2 otherwise there is a  division with denominator 0! 
x_4 = torch.tensor([-1.0], requires_grad=True)
x_5 = torch.tensor([-2.0], requires_grad=True)

print(x_1)
print(x_2)
print(x_3)
print(x_4)
print(x_5)

tensor([3.], requires_grad=True)
tensor([4.], requires_grad=True)
tensor([2.], requires_grad=True)
tensor([-1.], requires_grad=True)
tensor([-2.], requires_grad=True)


Let's first evaluate the forward function step by step:

In [41]:
a = x_1 + x_2
print(a)
b = x_3 + x_4
print(b)
c = a.sin()
print(c)
d = torch.nn.functional.relu(b)
print(d)
e = c/d
print(e)
f = e - x_5
print(f)
g = f.exp()
print(g)


tensor([7.], grad_fn=<AddBackward0>)
tensor([1.], grad_fn=<AddBackward0>)
tensor([0.6570], grad_fn=<SinBackward0>)
tensor([1.], grad_fn=<ReluBackward0>)
tensor([0.6570], grad_fn=<DivBackward0>)
tensor([2.6570], grad_fn=<SubBackward0>)
tensor([14.2533], grad_fn=<ExpBackward0>)


This is the image of the manual calculations for the backward propagation:

![](ex2.png)

Now let's verify the manual counts:

In [42]:
g.backward()
print(x_1.grad)
print(x_2.grad)
print(x_3.grad)
print(x_4.grad)
print(x_5.grad)

tensor([10.7456])
tensor([10.7456])
tensor([-9.3642])
tensor([-9.3642])
tensor([-14.2533])
