## Dag 2 -- Oppgave 2

In [1]:
import torch
import numpy as np
import matplotlib.pyplot as plt

## Oppgave 2a
La $w = \begin{bmatrix} 1 & 0 & 2 \end{bmatrix}^{\top}$ og $b=2$. Lag 100 datapunkter $(x,y) \in \mathbb{R}^3\times \mathbb{R}$, hvor
            $$ y = w^{\top}x + b + \epsilon, \quad\quad x_i\sim \mathcal{N}(0,2^2)\text{ for }i=1,2,3\text{ og }\epsilon \sim \mathcal{N}(0,0.6^2) $$

In [2]:
N = 100      # number of training data

b_ref = 2
w1_ref = 1.0
w2_ref = 0.0
w3_ref = 2.0

x1 = 2*torch.randn(N, dtype=torch.float)
x2 = 2*torch.randn(N, dtype=torch.float)
x3 = 2*torch.randn(N, dtype=torch.float)
noise = 0.6*torch.randn(N, dtype=torch.float)

y = w1_ref*x1 + w2_ref*x2 + w3_ref*x3 + b_ref + noise


## Oppgave 2b
Gitt det nye datasettet, prøv å estimere $w \in \mathbb{R}^3$ og $b \in \mathbb{R}$,
        ved å løse
        $$
            \mathrm{min}_{w,b} \,\,\,\sum_{k=1}^{100} (w^{\top}x^{(k)} + b - y^{(k)})^2.
        $$

Vi definierer loss funksjon

In [3]:
def MSE_loss(prediction, target): # Mean squared error (MSE)
    return (prediction-target).pow(2).mean()

Vi estimerer $w_1$, $w_2$, $w_3$ og $b$ ved å bruke gradient descent

In [4]:
w1 = torch.tensor(0, dtype=torch.float, requires_grad=True)
w2 = torch.tensor(0, dtype=torch.float, requires_grad=True)
w3 = torch.tensor(0, dtype=torch.float, requires_grad=True)
b = torch.tensor(0, dtype=torch.float, requires_grad=True)

number_of_epochs = 500
learning_rate = 0.01
for iter in range(number_of_epochs):
    y_pred = w1*x1 + w2*x2 + w3*x3 + b
    loss = MSE_loss(y_pred,y)
    loss.backward() # Compute gradients
    # Update weights, without tracking the computations
    with torch.no_grad():
        w1 = w1 - learning_rate*w1.grad
        w2 = w2 - learning_rate*w2.grad
        w3 = w3 - learning_rate*w3.grad
        b = b - learning_rate*b.grad
    # In the update step above, these have been set to False
    w1.requires_grad = True
    w2.requires_grad = True
    w3.requires_grad = True
    b.requires_grad = True
    
print('w1: ', w1)
print('w2: ', w2)
print('w3: ', w3)
print('b: ', b)


w1:  tensor(1.0019, requires_grad=True)
w2:  tensor(-0.0322, requires_grad=True)
w3:  tensor(1.9741, requires_grad=True)
b:  tensor(1.9413, requires_grad=True)


## Oppgave 2c
Hvor mange iterasjoner trenger du å kjøre for å løse dette med gradient descent?


Etter å ha kjørt koden noen ganger med forskjellig antall iterasjoner, så virker som vi får et bra estimat for $w$ og $b$ ved å kjøre rundt 500 iterasjoner.