# Fitting Models

In the first few chapters we learnt the process of passing the data forward through the (Shallow/Deep) network. 
In the previous chapter we discussed how to measure the missmatch between the network predictions and the ground truth for a training set. 

These are crucial but partial aspects of the model creation, indeed after recieveing an answer from the loss function we must correct our model to improve. 
This is known as $\text{Fitting \ or \ Training}$ the model.

This is a process where we'll improve our paramters by: 

1. Computing the derivative with respect to the parameters.
2. Adjust the parameters based on the gradients to reduce the loss.

Note that in Chapter 1, we were able to find a closed for the linear regression problem, however, when the network becomes to complex, this becomes a less viable option. <br>
Instead we'll follow an iterative approach which we'll show that under certain general conditions produces good approximation results.

Here we focus on step (2) in the process of improving the parameters.

<div  align="center">

### Optimization Problem

</div>


 $$\boxed{\hat{\phi} = \text{argmin}_{\phi}\big[L[\phi]\big]}$$

## Gradient Descent

$\hspace{7cm} \textbf{Step 0: Initialise parameters so some values: }$
$$ \phi = [\phi_0, \phi_1, \dots, \phi_k]^T$$


$$\boxed{\begin{aligned}
&\textbf{Step 1: Up-hill - Compute the derivatives of the loss with respect to the parameters} \\
&\hspace{6cm} \frac{\partial L}{\partial \phi} = \begin{bmatrix} \frac{\partial L}{\partial \phi_0} \\ \frac{\partial L}{\partial \phi_1} \\ \vdots \\ \frac{\partial L}{\partial \phi_k} \end{bmatrix} \\[1em]
&\textbf{Step 2: Down-Hill - Update the parameters according to the rule} \\
&\hspace{6cm} \phi = \phi - \alpha \cdot \frac{\partial L}{\partial \phi}
\end{aligned}}$$

### 1D Linear Regression Example 

In the first chapter we showed the closed form, now we'll present the iterative solution to this problem.

$$\boxed{\text{model: } f[x, \phi] \qquad \text{parameters:} \phi = [\phi_0, \phi_1]^T \qquad \text{Input: }x \in \mathbb{R}}$$

$$\boxed{y = f[x, \phi] = \phi_0 + \phi_1 x}$$

We know that our loss is the mean squared Error

$$L[\phi] = \sum_{i=1}^N l_i = \sum_{i=1}^N (f[x_i, \phi] - y_i)^2 =  \sum_{i=1}^N (\phi_0 + \phi_1 x - y_i)^2$$





**Step 1** 

$$\frac{\partial L[\phi]}{\partial \phi} = \frac{\partial }{\partial \phi}\sum_{i=1}^N l_i \underset{\text{linearity }}{=} \sum_{i=1}^N\frac{\partial l_i}{\partial \phi} = \sum_{i=1}^N \begin{bmatrix} \frac{\partial l_i}{\partial \phi_0} \\ \frac{\partial l_i}{\partial \phi_1} \end{bmatrix} = \sum_{i=1}^N \begin{bmatrix} 2\cdot 1(\phi_0 + \phi_1x_i - y_i) \\ 2 x_i(\phi_0 + \phi_1x_i - y_i)\cdot \end{bmatrix}$$

**Step 2**

$$ \phi = \phi - \alpha \frac{\partial L}{\partial \phi} $$

<div align="center">

**Gradient Descent Visualization:**

<img src="./images/chap5/1dgradDescent_reg.gif" alt="Gradient Descent Animation" width="500" />
<img src="./images/chap5/lossSpace.png" alt="Loss Space" width="390" />

*If the animation doesn't display, [click here to view the video](./images/chap5/1dgradDescent_reg.gif)*

</div>