## $R^2$ for Bayesian regression models

In [R-squared for Bayesian Regression Models](https://www.tandfonline.com/doi/abs/10.1080/00031305.2018.1549100) by Gelman et al. (2018),
the authors propose a generalization of the classical coefficient of determination $R^2$ to Bayesian regression models.

The coefficient of determination they propose is defined as:

$$
\frac{\text{Explained variance}}{\text{Explained variance} + \text{Residual variance}} = \frac{\text{var}_\text{fit}}{\text{var}_\text{fit} + \text{var}_\text{res}}
$$

where $\text{var}_\text{fit}$ and $\text{var}_\text{res}$ are computed as:

$$
\begin{aligned}
\text{var}_\text{fit} &= \text{V}\left(\mathbb{E}(\tilde{y}_i \mid \boldsymbol{x}_i, \boldsymbol{\theta})\right) =  \text{V} \left(y_i^\text{pred}\right) \\
\text{var}_\text{res} &= \text{M}\left(\mathbb{V}\left(\tilde{y}_i - y_i^\text{pred} \mid \boldsymbol{x}_i, \boldsymbol{\theta}\right) \right)
\end{aligned}
$$

where $M$ and $V$ denote the sample mean and variance operators, respectively.

The first term is the variance of the expectation of future data and the second one is the expected variance of future residuals.

This Bayesian $R^2$ is conditional on the explanatory variables and the model parameters $\boldsymbol{\theta}$.
For this reason, this $R^2$ is proposed as an _a posteriori_ mesasure of model fit.

If we have draws from the posterior distribution, we can compute the Bayesian $R^2$ for each draw.

### Common cases

In the case of normal regression models, the components of the Bayesian $R^2$ simplify to:

$$
\begin{aligned}
\text{var}_\text{fit} &= \text{V}(\mu_i) \\
\text{var}_\text{res} &= \sigma^2
\end{aligned}
$$

where $\mu_i = \boldsymbol{x}_i^T \boldsymbol{\beta}$.

And for logistic regression models, they become:

$$
\begin{aligned}
\text{var}_\text{fit} &= \text{V}(\pi_i) \\
\text{var}_\text{res} &= \text{M}(\pi_i (1 - \pi_i))
\end{aligned}
$$

where $\pi_i = \text{expit}(\boldsymbol{x}_i^T \boldsymbol{\beta})$.

## The R2D2 prior

The R2D2 prior for normal regression models has been introduced in the paper 
[Bayesian Regression Using a Prior on the Model Fit: The R2-D2 Shrinkage Prior](https://arxiv.org/abs/1609.00046) by Zhang et al. (2016).

Consider the normal regression model:

$$
\begin{aligned}
Y_i \mid \mu_i, \sigma^2 &\underset{iid}{\sim} \text{Normal}(\mu_i, \sigma^2) \\
\mu_i &= \alpha + \beta_1 X_{1i} + \beta_2 X_{2i} + \dots + \beta_p X_{pi} \\
&= \alpha + \boldsymbol{X}\boldsymbol{\beta}
\end{aligned}
$$

where $\boldsymbol{X}$ is the $n \times p$ design matrix (without the intercept), $\alpha$ is the intercept term, and $\boldsymbol{\beta}$, of length $p$, is the vector of coefficients.

The central idea in the R2-D2 prior is to place a prior directly on the coefficient of determination $R^2$.
For the purpose of defining prior distributions, however, Zhang et al. work with the **marginal** coefficient of determination: a version of $R^2$ that averages over both the design matrix $\boldsymbol{X}$ and the regression coefficients $\boldsymbol{\beta}$, rather than the conditional coefficient of determination proposed by Gelman et al, that we mentioned above.

For the linear regression model, the marginal $R^2$ is defined as:

$$
R^2 = \frac{\mathbb{V}(\boldsymbol{x}^T \boldsymbol{\beta})}{\mathbb{V}(\boldsymbol{x}^T \boldsymbol{\beta}) + \sigma^2}
= \frac{\sigma^2 W}{\sigma^2 W + \sigma^2}
= \frac{W}{W + 1}
$$

which is the ratio of the marginal variance of the linear predictor to the marginal variance of the outcome.

Then, the R2-D2 prior is specified as:

$$
\begin{aligned}
\beta_j &\sim \text{Normal}(0, \phi_j W \sigma^2) \\
\boldsymbol{\phi} &\sim \text{Dirichlet}(\xi_1, \dots, \xi_p) \\
W & = \frac{R^2}{1 - R^2}\\
R^2 &\sim \text{Beta}(a, b) \\
\end{aligned}
$$

Through the transformation $W = R^2/(1-R^2)$, the prior on $R^2$ induces a prior on $W$, which governs the total prior variance of the linear predictor $\boldsymbol{x}^T \boldsymbol{\beta}$.
Combined with the Dirichlet prior on the variance proportions $\boldsymbol{\phi}$, this results in the $R^2$-induced Dirichlet Decomposition (R2-D2) prior.

It can be shown that the induced prior on $W$ is a Beta Prime distribution with parameters $a$ and $b$.