# A quick tour of model selection, combination, and regularization

* Vivek Srikrishnan and James Doss-Gollin
* Keller Group Meeting
* Thursday 16 July 2020

## The challenge

Earth systems are:

* high-dimensional multi-scale
* nonlinear / complex
* $\mathcal{M}$-open (i.e., we can't write down the "true" model $\approx$)

When there are many candidate models, _each of which is wrong_, how should we use their collective insight to make robust and distributionally accurate predictions?

> "It's hard to make predictions, especially about the future" -- Yogi Berra on "equifinality"

## Outline

1. Model selection
1. Regularization
1. Model combination
1. Next steps
1. Further Resources

## Model selection

A natural approach to dealing with a large set of candidate models is to select the model that is the "best". To pick the best model we'll need

* some data: $(x_i, y_i)$
* a set of candidate models $M \in \mathcal{M}$ where $M : x_i \rightarrow p(y_i | x_i)$
* some criteria for model quality: $f(M, y)$

Canonical case studies and much of the theory is based on selecting which variables should be used for a linear regression, often making Gaussian approximations, but thus fare we have made no such restrictions

### Significance criteria

Use Null Hypothesis Significance Testing (NHST) to decide whether to include a variable.
For example, consider:
$$
\begin{align}
M_1: y_i &\sim \mathcal{N} \left( \alpha_0 + \alpha_1 x_i , \sigma_1^2 \right) \\
M_2: y_i &\sim \mathcal{N} \left( \beta_0 + \beta_1 x_i + \beta_2 t_i, \sigma_2 ^2 \right)
\end{align}
$$
One could:

1. Form a null hypothesis: $\beta_2 = 0$
1. Test statistics $\Rightarrow$ $p$-value
1. If $p < \alpha$ then use $M_2$ else use $M_1$

Problems:

* multiple comparisons (Gelman & Loken 2013, Heinze 2018)
* what to do with many possible models?

### Likelihood ratio tests

Stick with our example:
$$
\begin{align}
M_1: y_i &\sim \mathcal{N} \left( \alpha_0 + \alpha_1 x_i , \sigma_1^2 \right) \\
M_2: y_i &\sim \mathcal{N} \left( \beta_0 + \beta_1 x_i + \beta_2 t_i, \sigma_2 ^2 \right)
\end{align}
$$
it is straightforward to calculate the likelihood of observed outcomes given each model: $p(y | x, M_1)$ and $p(y | x, M_1)$.
Often 

## Regularization

## Model combination

## References / read more

* Gelman, A., & Loken, E. (2013, November 14). The garden of forking paths: Why multiple comparisons can be a problem, even when there is no “fishing expedition” or “p-hacking” and the research hypothesis …. Retrieved from http://www.stat.columbia.edu/~gelman/research/unpublished/p_hacking.pdf
* Heinze, G., Wallisch, C., & Dunkler, D. (2018). Variable selection – A review and recommendations for the practicing statistician. Biometrical Journal, 60(3), 431–449. https://doi.org/10.1002/bimj.201700067
* Navarro, D. J. (2018). Between the Devil and the Deep Blue Sea: Tensions Between Scientific Judgement and Statistical Model Selection. Computational Brain & Behavior. https://doi.org/10.1007/s42113-018-0019-z
* Piironen, J., & Vehtari, A. (2017). Comparison of Bayesian predictive methods for model selection. Statistics and Computing, 27(3), 711–735. https://doi.org/10.1007/s11222-016-9649-y
* Vehtari, A., Gelman, A., & Gabry, J. (2017). Practical Bayesian Model Evaluation Using Leave-One-out Cross-Validation and WAIC. Statistics and Computing, 27(5), 1413–1432. https://doi.org/10.1007/s11222-016-9696-4
* Yao, Y., Vehtari, A., Simpson, D., & Gelman, A. (2018). Using Stacking to Average Bayesian Predictive Distributions. Bayesian Analysis. https://doi.org/10.1214/17-BA1091
