# Probabilistic models for linear regression

Consider a regression problem with dataset $\mathcal D = \{(\boldsymbol x_i, y_i)\}_{i=1}^n$, $y_i\in\mathbb R$. The probabilistic model for the linear regression model assumes that

$$
    y_i = \boldsymbol x_i^\top \boldsymbol w + \varepsilon_i,
$$

where $\varepsilon_i$ is some random noise.

## Gaussian model

In this setting the random noise is gaussian: $\varepsilon_i \sim \mathcal N (0, \sigma^2)$. Hence,

$$
    p(y_i \vert \boldsymbol x_i, \boldsymbol w) = \mathcal N(\boldsymbol x_i^\top \boldsymbol w, \sigma^2) = \frac 1{\sqrt{2\pi} \sigma} \exp\Big(-\frac{(\boldsymbol x_i^\top \boldsymbol w - y_i)^2}{2\sigma^2}\Big).
$$

**Q**. What is $\mathbb E y_i$? $\mathbb V y_i$?

A picture from [ML Handbook](https://academy.yandex.ru/handbook/ml/article/beta-pervoe-znakomstvo-s-veroyatnostnymi-modelyami#sluchajnost-kak-istochnik-nesovershenstva-modeli):

```{figure} lin_reg_gaussians.png
:align: center
```

**Q**. What is $\boldsymbol x_i$ and $\boldsymbol w$ in this picture? 

The likelihood of the dataset $\mathcal D$ is

$$
    p(\boldsymbol y \vert \boldsymbol X, \boldsymbol w) = \prod_{i=1}^n \mathcal N(\boldsymbol x_i^\top \boldsymbol w, \sigma^2) = \frac 1{(\sqrt{2\pi}\sigma)^n}\exp\Big(-\sum\limits_{i=1}^n \frac{(\boldsymbol x_i^\top \boldsymbol w - y_i)^2}{2\sigma^2}\Big).
$$

Hence, the negative log-likelihood is

$$
    \mathrm{NLL}(\boldsymbol w) = -\log p(\boldsymbol y \vert \boldsymbol X, \boldsymbol w)=
    \frac 1{2\sigma^2} \sum\limits_{i=1}^n (\boldsymbol x_i^\top \boldsymbol w - y_i)^2 + \frac n2\log(2\pi\sigma)^2.
$$

**Q.** How does this NLL relates to the loss function {eq}`lin-reg-loss-opt` of the linear regression model?

Thus, the optimal weights {eq}`lin-reg-solution` of the linear regression with MSE loss coincide with MLE:

$$
    \boldsymbol{\widehat w} = \arg\max\limits_{\boldsymbol w} p(\boldsymbol y \vert \boldsymbol X, \boldsymbol w) = \arg\min\limits_{\boldsymbol w} \mathrm{NLL}(\boldsymbol w).
$$


## Laplacian model

Now suppose that out loss function is MAE. Then 

$$  
    \boldsymbol{\widehat w} = \arg\min\limits_{\boldsymbol w} \sum\limits_{i=1}^n \vert \boldsymbol x_i^\top \boldsymbol w - y_i\vert = \arg\max\limits_{\boldsymbol w} \Big(-\sum\limits_{i=1}^n \vert \boldsymbol x_i^\top \boldsymbol w - y_i\vert\Big)
$$

Which probabilistic model will give

$$
    \mathrm{NLL}(\boldsymbol w) = -\sum\limits_{i=1}^n \vert \boldsymbol x_i^\top \boldsymbol w - y_i\vert?
$$

Well, then likelihood should be

$$
    p(\boldsymbol y \vert \boldsymbol X, \boldsymbol w) = \exp\Big(-\sum\limits_{i=1}^n \vert\boldsymbol x_i^\top \boldsymbol w - y_i\vert\Big) = \prod\limits_{i=1}^n \exp(-\vert\boldsymbol x_i^\top \boldsymbol w - y_i\vert).
$$

Hence,

$$
    p(y_i \vert \boldsymbol x_i, \boldsymbol w) = \exp(-\vert\boldsymbol x_i^\top \boldsymbol w - y_i\vert),
$$

and

$$
    y_i  \sim \mathrm{Laplace}(\boldsymbol x_i^\top \boldsymbol w, b).
$$
