# Univariate Regression

In Linear Regression models, the goal is to predict a single scalar output $y \in \mathbb{R}$ from input $x$ using paramaters $\phi$.

#### Reminder on the General Method

$$\boxed{\begin{aligned}
&1. \text{ Given the output choose a suitable probability distribution } Pr(y | \theta) \text{ defined over the domain of predictions} \\
\\
&2. \text{ Set the ML model } f[x, \phi] \text{ to predict all independant parameters } \\
&\quad \text{(and compute the rest of the parameters based on what's learnt) so } \theta = f[x, \phi] \text{ and } Pr(y | \theta) = Pr(y | f[x, \phi]) \\
\\
&3. \text{ We train the model to find the network parameters } \hat{\phi} \text{ that minimizes the negative log-likelihood} \\
&\quad \text{over the training dataset } \{x_i, y_i\}_{i=1}^N \\
\\
&4. \text{ When needed to perform the inference we'll apply the argmax of the distribution } Pr(y | f[x, \hat{\phi}])
\end{aligned}}$$

#### Probability Distribution

Since our domain is scalar, we'll choose the univariate normal distribution, this has two parameters $\mu$ and $\sigma^2$, mean and variance and the pdf: 

$$Pr(y | \mu, \sigma^2) = \frac{1}{\sqrt{2\pi \sigma^2}}\exp\left[-\frac{(y-\mu)^2}{2\sigma^2}\right]$$

#### Set Model

We set the ML model $f[x, \phi]$ to compute one or more of the parameters of this distribution. $\mu = f[x, \phi]$

$$Pr(y | f[x, \phi], \sigma^2) = \frac{1}{\sqrt{2\pi \sigma^2}}\exp\left[-\frac{(y-f[x, \phi])^2}{2\sigma^2}\right]$$

We then apply the Negative Log-Likelihood

$$\begin{align}
L[\phi] &= - \sum_{i=1}^Ilog\left[Pr(y_i | f[x_i, \phi], \sigma^2)\right] \\
&= - \sum_{i=1}^Ilog\left[\frac{1}{\sqrt{2\pi \sigma^2}}\exp\left[-\frac{(y_i-f[x_i, \phi])^2}{2\sigma^2}\right]\right]
\end{align}$$

#### Optimize: Seek Minimizing Parameters

$$\begin{align} \hat{\phi} 
&= \textbf{argmin}_\phi \left[ - \sum_{i=1}^I\log\left[\frac{1}{\sqrt{2\pi \sigma^2}}\exp\left[-\frac{(y_i-f[x_i, \phi])^2}{2\sigma^2}\right]\right]\right] \\
&= \textbf{argmin}_\phi \left[ - \sum_{i=1}^I \log\frac{1}{\sqrt{2\pi \sigma^2}} + \log\exp\left[-\frac{(y_i-f[x_i, \phi])^2}{2\sigma^2}\right]\right] \\
&= \textbf{argmin}_\phi \left[ - \sum_{i=1}^I \left[-\frac{(y_i-f[x_i, \phi])^2}{2\sigma^2} - \log{\sqrt{2\pi \sigma^2}}\right] \right]\\
&= \textbf{argmin}_\phi \left[ \sum_{i=1}^I \left[(y_i-f[x_i, \phi])^2 \right] \right]
\end{align}$$

Line 1 $\to$ 2: Applied log arithmatics<br>
Line 2 $\to$ 3: Simplified exponentitating log + log(1) = 0 so it's removed<br>
Line 3 $\to$ 4: Terms that aren't trainable by the model are considered constants as such can be ignored in the optimization problem

#### Inference 

The network no longer predicts $y$ but the $\mu = f[x, \phi]$ of the normal distribution over $y$.<br> So to choose the best $y$ which choose the $y$ which produces the highest probability.

$$\hat{y} = \mathbf{argmax}_y\left[Pr(y | f[x, \hat{\phi}], \sigma^2)\right]$$

## Estimating Variance

We assumed that the network predicted the mean of a normal distribution, and we considered the variance as a constant factor.<br> We could just as well treat $\sigma^2$ as a learned parameter and minimize the negative log-likelihood.

$$\hat{\phi}, \hat{\sigma}^2 = \textbf{argmin}_{\phi, \sigma^2} \left[ - \sum_{i=1}^I\log\left[\frac{1}{\sqrt{2\pi \sigma^2}}\exp\left[-\frac{(y_i-f[x_i, \phi])^2}{2\sigma^2}\right]\right]\right]$$


**Insight**

$\mu$ - When we make inferences on the prediction, we'll use the mean to extract the best prediction. <br>
$\sigma^2$ - When can then use this inference to indicate the certainty of this prediction.

## Heteroscedastic regression

**What happens if $\sigma$ isn't constant in the data?**

We can create for example a shallow network with two outputs $f_1[x, \phi] \ f_2[x, \phi]$, one to predict the mean and the other to predict the variance.

The only thing we need to ensure is that the variance is always positive as such we'll have: 

$$\begin{align} \mu &= f_1[x, \phi] \\ \sigma^2 &= f_2[x, \phi]^2 \end{align}$$

As such we aim to find the minimizing parameters: 

$$\hat{\phi} = \textbf{argmin}_{\phi} \left[ - \sum_{i=1}^I\log\left[\frac{1}{\sqrt{2\pi f_2[x, \phi]^2}}\exp\left[-\frac{(y_i-f_1[x, \phi])^2}{2f_2[x, \phi]^2}\right]\right]\right]$$
