# Introduction

The notebooks in this folder analyze different properties of several loss functions commonly used when training machine learning models. For each loss function, we cover

1. the predicted statistic when minimzing the loss function
1. visualization of the loss function for a single example.
1. the maximum-likelihood interpretation of the loss function (TODO)

# Symbols

For a given example, I'll use

* $\mathbf{x}$ to represent the feature vector
* $t$ to represent the label (aka. target)
* $y$ or $y(\mathbf{x})$ to represent the prediction

We assume the function $y(\mathbf{x})$ is completely flexible (
[PRML](https://www.microsoft.com/en-us/research/uploads/prod/2006/01/Bishop-Pattern-Recognition-and-Machine-Learning-2006.pdf),
Page 46), so that the derivative of the loss function wrt. $y(\mathbf{x})$
exists.

# Summary (WIP)

| Loss function name             | $L(y, t)$                                                     | $\arg \min_{y} \mathbb{E}[L]$ | Conditional distribution | $p(t|\mathbf{x}, y)$                                                                                  |
|--------------------------------|-----------------------------------------------------------------------------|---------------------------------------------------|---------------------------------------|-----------------------------------------------------------------------------------------------------|
| Minkowski loss (q=2, aka. MSE) | $ \left|y - t \right|^2 $   | Conditional mean: $\mathbb{E}[t|\mathbf{x}]$     | Gaussian                              | $\frac{1}{\sqrt{2 \pi \sigma^2}} \exp \left \{- \frac{1}{2} \frac{|t - y|^2}{\sigma^2} \right \}$ |
| Minkowski loss (q=1, aka. MAE) | $ \left|y - t \right|^1 $   | Conditional median: $F_{t|\mathbf{x}}^{-1}(0.5)$ | Laplace                             | $\frac{1}{2 \sigma} \exp \left \{- \frac{|t - y|}{\sigma} \right \}$                                      |
| Minkowski loss (q=0)           | $ \lim_{ q\rightarrow 0} \left|y - t \right|^{q} $ | Conditional mode                                 | $?$                                     | $?$                                                                                                   |
|Log-loss (aka. cross-entropy loss) | $-\left[ t \ln y + (1 - t) \ln(1 - y) \right]$ | Conditional mean: $\mathbb{E}[t|\mathbf{x}]$ | Bernoulli | $\exp \Big \{ t \ln y + (1 - t) \ln (1 - y) \Big \}$ |
|Poisson loss | $   y - t \ln y $ | Conditional mean: $\mathbb{E}[t|\mathbf{x}]$ | Poisson | $\exp \Big \{ t \ln y - y - \ln t! \Big\}$ |
|Pinball loss | $(t - y) (\tau - \mathbb{I}(t < y))$ | Conditional quantile: $F_{t|\mathbf{x}}^{-1}(\tau)$ | Asymmetric Laplace | $\frac{\tau(1 - \tau)}{ \sigma } \exp \left\{ -\frac{(t - y) (\tau - \mathbb{I}(t < y))}{\sigma} \right \}$ |