In [1]:
import numpy as np
import scipy as sp
import scipy.stats
import matplotlib.pyplot as plt
import math as mt
import scipy.special
import seaborn as sns
plt.style.use('fivethirtyeight')
from statsmodels.graphics.tsaplots import plot_acf
import pandas as pd

Define some color in this cell.
$$
\require{color}
\definecolor{red}{RGB}{240,5,5}
\definecolor{blue}{RGB}{5,5,240}
\definecolor{green}{RGB}{4,240,5}
\definecolor{black}{RGB}{0,0,0}
\definecolor{dsb}{RGB}{72, 61, 139}
\definecolor{Maroon}{RGB}{128,0,0}
$$

# <font face="gotham" color="orange"> Linear Regression Model </font>

## <font face="gotham" color="orange"> Normal-Gamma Conjugacy</font>

The common matrix form of linear regression is
$$
y =    X \beta+u
$$
where $u \sim N( {0}, \sigma^2 I)$.

The covariance matrix of disturbance term $u$ is 
$$
\begin{aligned}
&\operatorname{var}(u) \equiv\left[\begin{array}{cccc}
\operatorname{var}\left(u_{1}\right) & \operatorname{cov}\left(u_{1}, u_{2}\right) & \ldots & \operatorname{cov}\left(u_{1}, u_{N}\right) \\
\operatorname{cov}\left(u_{1}, u_{2}\right) & \operatorname{var}\left(u_{2}\right) & \ldots & . \\
\cdot & \operatorname{cov}\left(u_{2}, u_{3}\right) & \ldots & . \\
\cdot & \cdot & \ldots & \operatorname{cov}\left(u_{N-1}, u_{N}\right) \\
\operatorname{cov}\left(u_{1}, u_{N}\right) & . & \ldots & \operatorname{var}\left(u_{N}\right)
\end{array}\right]=\left[\begin{array}{cccc}
h^{-1} & 0 & \ldots & 0 \\
0 & h^{-1} & \ldots & . \\
. & . & \ldots & . \\
. & . & \cdots & 0 \\
0 & . & . & h^{-1}
\end{array}\right]
\end{aligned}
$$

For easier mathematical manipulation, we ususally use $h = 1/\sigma^2$, which is the **precision**. The diagonal form of covariance matrix reprensents two assumptions: **no serial correlation** and **homoscedasticity**.

Also with the assumption that $X$ is exogenous, we can construct the joint probability density 
$$
P(y, X \mid \beta, h)=P(y \mid X, \beta, h) P(X)
$$

However because $X$ does not depend $\beta$ and $h$, we narrow our interest onto
$$
P(y \mid X, \beta, h)
$$

Recall that multivariabe normal distribution takes the form

$$
f(X)=(2 \pi)^{-N / 2}|\Sigma|^{-1 / 2} \exp \left(-\frac{1}{2}(X-\mu)^T \Sigma^{-1}(X-\mu)\right) \text { for } \sigma>0
$$

where $|\Sigma|$ is the determinant of covariance matrix, $\Sigma^{-1}$ is the inverse of covariance matrix.

In the linear regression context, the determinat of covariance matrix is
$$
|\text{var}(u)|^{-1/2}=\left(\prod_{i=1}^Nh^{-1}\right)^{-1/2} = (h^{-N})^{-1/2}=h^{N/2}
$$

The inverse matrix of covariance matrix
$$
\text{var}(u)^{-1} = (h^{-1}I)^{-1}=hI
$$

## <font face="gotham" color="orange"> Likelihood Function </font>

With all previous preparation, the the likelihood function $P(y \mid X, \beta, h)$ simplified to
$$
P(y \mid  X, \beta, h)=(2 \pi)^{-\frac{N}{2}} h^{\frac{N}{2}} \exp \left[-\frac{h}{2}(y-\mathrm{X} \beta)^T(y-\mathrm{X} \beta)\right]
$$
However more mathematical manipulation needed in order to turn it into the conjugate form. 

Use the fact that $y-X\hat{\beta}=\hat{u}$, we rewrite

$$
(y-\mathrm{X} \beta)^T(y-\mathrm{X} \beta) = \left(\hat{u}-\mathrm{X}\left(\beta-\widehat{\beta}\right)\right)^T\left(\hat{u}-\mathrm{X}\left(\beta-\widehat{\beta}\right)\right)
$$
Expand it

$$
(y-\mathrm{X} \beta)^T(y-\mathrm{X} \beta) =\hat{u}^T \hat{u}-\hat{u}^T \mathrm{X}\left(\beta-\widehat{\beta}\right)-\left(\hat{u}^T \mathrm{X}\left(\beta-\widehat{\beta}\right)\right)^T+\left(\beta-\widehat{\beta}\right)^T \mathrm{X}^T \mathrm{X}\left(\beta-\widehat{\beta}\right)
$$

Use the fact $\hat{u}^TX = 0$, expression reduces to
$$
(y-\mathrm{X} \beta)^T(y-\mathrm{X} \beta)=\hat{u}^T \hat{u}+\left(\beta-\widehat{\beta}\right)^T \mathrm{X}^T \mathrm{X}\left(\beta-\widehat{\beta}\right)
$$

Plug in back to likelihood function
$$
P(y \mid  X, \beta, h)=(2 \pi)^{-\frac{N}{2}} h^{\frac{N}{2}} \exp \left[-\frac{h}{2}\left(\hat{u}^T \hat{u}+\left(\beta-\widehat{\beta}\right)^T \mathrm{X}^T \mathrm{X}\left(\beta-\widehat{\beta}\right)\right)\right]
$$

Seperate the exponential part
$$
p(y \mid \mathrm{X}, \beta, h)=(2 \pi)^{-\frac{N}{2}} h^{\frac{N}{2}} \exp \left[-\frac{h}{2}\left(\beta-\widehat{\beta}\right)^T\mathrm{X}^T \mathrm{X}\left(\beta-\widehat{\beta}\right)\right] \exp \left[-\frac{h}{2}\hat{u}^T \hat{u}\right]
$$

This form of likelihood function suggest a **natural conjugate prior** which has the same function form as likelihood and also yields a posterior within the same class of distribution.

## <font face="gotham" color="orange"> Prior </font>

Our prior should be a joint distribution $P(\beta, h)$, however in order to transform into NG distribution, it proves convenient to write 
$$
P(\beta, h)=P(\beta \mid h) P(h)
$$

And it is the **Normal-Gamma distribution** we are talking about, the right-hand side has the follow distribution
$$
\begin{gathered}
\beta \mid h \sim N\left(\mu, h^{-1} V\right) \\
h \sim \operatorname{Gamma}(m, v)
\end{gathered}
$$
The advantage of this distribution is that $\beta$ is on real number's domain and $h$ is on postive number's domain.

Recall the function form of NG distribution.
$$
f(X, h)=(2 \pi)^{-\frac{N}{2}}(h)^{-\frac{N}{2}}|\Sigma|^{-1 / 2} \exp \left(-\frac{h}{2}(X-\mu)^T \Sigma^{-1}(X-\mu)\right) \frac{1}{\left(\frac{2 m}{v}\right)^{v / 2}} \frac{1}{\Gamma\left(\frac{v}{2}\right)} h^{\frac{v-2}{2}} \exp \left[-\frac{h v}{2 m}\right]
$$

Seperate it for $\beta|h$ and $h$ for their priors
$$
\begin{gathered}
P(\beta \mid h)=(2 \pi)^{-\frac{k}{2}} h^{\frac{k}{2}}|V|^{-\frac{1}{2}} \exp \left[-\frac{h}{2}(\beta-\mu)^{\prime} V^{-1}(\beta-\mu)\right] \\
P(h)=\frac{1}{\left(\frac{2 m}{v}\right)^{v / 2}} \frac{1}{\Gamma\left(\frac{v}{2}\right)} h^{\frac{v-2}{2}} \exp \left[-\frac{h v}{2 m}\right]
\end{gathered}
$$
where $k$ is the number of parameters in $\beta$.

## <font face="gotham" color="orange"> Posterior </font>

The posterior is formulated by Bayes' Theorem

$$
P(\beta, h \mid y, X)\propto P(y \mid  X, \beta, h)P(\beta\mid h) P(h)
$$

$$
P(\beta, h \mid Y, \mathrm{X}) \propto \ h^{\frac{k}{2}} \exp \left[-\frac{h}{2}\left[\left(\beta-\widehat{\beta}\right)^T\mathrm{X}^{\prime} \mathrm{X}\left(\beta-\widehat{\beta}\right)+(\beta-\mu)^T V^{-1}(\beta-\mu)\right]\right]  h^{\frac{N+v-2}{2}} \exp \left[-\frac{h}{2}\left(\hat{u}^T \hat{u}+\frac{v}{m}\right)\right]
$$

$$
\begin{align}
P(\beta, h \mid Y, \mathrm{X}) &\propto(2 \pi)^{-\frac{N}{2}} h^{\frac{N}{2}} \exp \left[-\frac{h}{2}\left(\beta-\widehat{\beta}\right)^T\mathrm{X}^T \mathrm{X}\left(\beta-\widehat{\beta}\right)\right] \exp \left[-\frac{h}{2}\hat{u}^T \hat{u}\right]\\
&\times(2 \pi)^{-\frac{k}{2}} h^{\frac{k}{2}}|V|^{-\frac{1}{2}}\exp \left[-\frac{h}{2}(\beta-\mu)^{\prime} V^{-1}(\beta-\mu)\right]\frac{1}{\left(\frac{2 m}{v}\right)^{v / 2}} \frac{1}{\Gamma\left(\frac{v}{2}\right)} h^{\frac{v-2}{2}} \exp \left[-\frac{h v}{2 m}\right]
\end{align}
$$

First ignore the constant parts such as $2\pi$ and gamma function, then combine $h$

$$
\begin{align}
P(\beta, h \mid Y, \mathrm{X}) &\propto h^{\frac{k}{2}} \exp \left[-\frac{h}{2}\left(\beta-\widehat{\beta}\right)^T\mathrm{X}^T \mathrm{X}\left(\beta-\widehat{\beta}\right)\right] \exp \left[-\frac{h}{2}\hat{u}^T \hat{u}\right]\\
&\times \exp \left[-\frac{h}{2}(\beta-\mu)^{\prime} V^{-1}(\beta-\mu)\right] h^{\frac{N+v-2}{2}} \exp \left[-\frac{h v}{2 m}\right]
\end{align}
$$

Join exponential terms
$$
\begin{align}
P(\beta, h \mid Y, \mathrm{X}) &\propto h^{\frac{k}{2}} \exp \left[-\frac{h}{2}\left(\beta-\widehat{\beta}\right)^T\mathrm{X}^T \mathrm{X}\left(\beta-\widehat{\beta}\right)-\frac{h}{2}(\beta-\mu)^{\prime} V^{-1}(\beta-\mu)\right] h^{\frac{N+v-2}{2}} \exp \left[-\frac{h}{2}\hat{u}^T \hat{u}-\frac{h v}{2 m}\right]\\
&\propto\ h^{\frac{k}{2}} \exp \left[-\frac{h}{2}\left[\left(\beta-\widehat{\beta}\right)^T\mathrm{X}^{\prime} \mathrm{X}\left(\beta-\widehat{\beta}\right)+(\beta-\mu)^T V^{-1}(\beta-\mu)\right]\right]  h^{\frac{N+v-2}{2}} \exp \left[-\frac{h}{2}\left(\hat{u}^T \hat{u}+\frac{v}{m}\right)\right]
\end{align}
$$

## <font face="gotham" color="orange"> Normal-Inverse Gamma Conjugacy</font>

Recall multivariate normal distribution density function

$$
f(X)=(2 \pi)^{-\frac{N}{2}}|\Sigma|^{-\frac{1}{2}} \exp \left(-\frac{1}{2}(X-\mu)^T \Sigma^{-1}(X-\mu)\right) \text { for } \sigma>0
$$

We assume likelihood ${y} \sim {N}\left({ {X}\beta}, \sigma^{2} {I}\right)$ is exactly same, just switched the denotions

$$
p\left({y} \mid {X}, {\beta}, \sigma^{2}\right)=(2 \pi)^{-\frac{N}{2}}\left|\sigma^{2} {I}\right|^{-\frac{1}{2}} \exp \left(-\frac{1}{2}\left({y}-{ {X}\beta}\right)^{T}\left(\sigma^{2} {I}\right)^{-1}\left({y}-{ {X}\beta}\right)\right)
$$

Use determinants rules
$$
\left|\sigma^{2} {I}\right|=\sigma^{2N}\\
\left(\sigma^{2} {I}\right)^{-1}=\sigma^{-2}I
$$

<div style="background-color:LightCoral; color:DarkBlue; padding:30px;">
    Likelihood simplifed to
$$
p\left({y} \mid {X}, {\beta}, \sigma^{2}\right)=\left(2 \pi \sigma^{2}\right)^{-N / 2} \exp \left(-\frac{1}{2 \sigma^{2}}\left({y}- {X}{\beta}\right)^{T}\left( {y}- {X}{\beta}\right)\right)
$$
</div>

Prior is usually decomposed $p\left({\beta}, \sigma^{2}\right)=p\left({\beta} \mid \sigma^{2}\right) p\left(\sigma^{2}\right)$ where ${\beta} \mid \sigma^{2} \sim {N}\left({0}, \sigma^{2} {\Lambda}^{-1}\right)$ and $\sigma^{2} \sim \operatorname{InvGamma}\left(\alpha, \beta\right)$

$$
p\left( {\beta} \mid \sigma^{2}\right)=(2 \pi)^{-\frac{k}{2}}\left|\sigma^{2} {\Lambda}^{-1}\right|^{-\frac{1}{2}} \exp \left(-\frac{1}{2}\left({\beta}-{\mu}\right)^T\left(\sigma^2 {\Lambda}^{-1}\right)^{-1}\left({\beta}-{\mu}\right)\right)
$$

Use determinants rules
$$
|\sigma^2\Lambda^{-1}|^{-\frac{1}{2}}=(\sigma^{2k}|\Lambda^{-1}|)^{-\frac{1}{2}}=(\sigma^{2k}|\Lambda|^{-1})^{-\frac{1}{2}}=\sigma^{-k}|\Lambda|^{\frac{1}{2}}\\
$$
then
$$
(2 \pi)^{-\frac{k}{2}}\left|\sigma^{2} {\Lambda}^{-1}\right|^{-\frac{1}{2}}=(2\pi\sigma^2)^{-\frac{k}{2}}|\Lambda|^\frac{1}{2}
$$

<div style="background-color:LightCoral; color:DarkBlue; padding:30px;">
Prior $P\left( {\beta} \mid \sigma^{2}\right)$ simplified to
$$
p\left( {\beta} \mid \sigma^{2}\right)=\left(2 \pi \sigma^{2}\right)^{-\frac{k}{2}}\left|{\Lambda}\right|^{\frac{1}{2}} \exp \left(-\frac{1}{2 \sigma^{2}}\left({\beta}-{\mu}\right)^{T} {\Lambda}\left({\beta}-{\mu}\right)\right) 
$$
</div>

Recall the inverse gamma distribution is

$$
f(x ; \alpha, \beta)=\frac{\beta^{\alpha}}{\Gamma(\alpha)}(1 / x)^{\alpha+1} \exp (-\beta / x)
$$

<div style="background-color:LightCoral; color:DarkBlue; padding:30px;">
follow the function form, prior $P(\sigma^2)$ is
$$
P\left(\sigma^{2}\right)=\frac{\beta^{\alpha}}{\Gamma\left(\alpha\right)}\left(\sigma^{2}\right)^{-\left(\alpha+1\right)} \exp \left(-\beta / \sigma^{2}\right)
$$
</div>

## <font face="gotham" color="orange"> Kernel Decomposition </font>

To move forward, there will be horrible amount of linear algebraic manipulation, the first is kernel decomposition

$$
\begin{aligned}
( {y}- {X}  {\beta})^{T}( {y}- {X}  {\beta})=&(\overbrace{ {y}- {X} \hat{ {\beta}}}^{A}+\overbrace{{X} \hat{{\beta}}- {X}  {\beta}}^{B})^{T} (\overbrace{{y}- {X} \hat{ {\beta}}}^{A}+\overbrace{ {X} \hat{ {\beta}}- {X}  {\beta}}^{B}) \\
=& A^{T} A+B^{T} B+ A^{T} B+B^TA\\
=& A^{T} A+B^{T} B+2 A^{T} B \\
&\text{(We use the fact }A^TB=B^TA \text{, because both terms are scalar, transposition is itself)}\\
=&A^{T} A+B^{T} B-2\overbrace{( {y}- {X} \hat{ {\beta}})^{T}( {X} \hat{ {\beta}}- {X}  {\beta})}^{0} \\
=&({y}-{X} \hat{{\beta}})^{T}({y}-{X} \hat{{\beta}})+(\hat{\beta}^TX^T-\beta^TX^T)(X\hat{\beta}-X\beta) \\
=&({y}-{X} \hat{{\beta}})^{T}({y}-{X} \hat{{\beta}})+(\hat{{\beta}}-{\beta})^{T} {X}^{T} {X}(\hat{{\beta}}-{\beta}) 
\end{aligned}
$$

<div style="background-color:LightCoral; color:DarkBlue; padding:30px;">
Kernel decomposition result is
$$
( {y}- {X}  {\beta})^{T}( {y}- {X}  {\beta})=({y}-{X} \hat{{\beta}})^{T}({y}-{X} \hat{{\beta}})+(\hat{{\beta}}-{\beta})^{T} {X}^{T} {X}(\hat{{\beta}}-{\beta}) 
$$
</div>

The quadratic form $(\hat{{\beta}}-{\beta})^{T} {X}^{T} {X}(\hat{{\beta}}-{\beta})$ is created be combined with the kernal of prior $P\left( {\beta} \mid \sigma^{2}\right)$ that is $\left({\beta}-{\mu}\right)^{T} {\Lambda}\left({\beta}-{\mu}\right)$

Joining exponential terms in likehood and prior will end up as an addition of them
$$
(\hat{{\beta}}-{\beta})^{T} {X}^{T} {X}(\hat{{\beta}}-{\beta})+\left({\beta}-{\mu}\right)^{T} {\Lambda}\left({\beta}-{\mu}\right)
$$

Expand both terms, again the $2 \hat{ {\beta}}^T  {X}^{T}  {X}  {\beta}$ and $2  {\mu}^{T}  {\Lambda}  {\beta}$ exist because cross terms are scalars.
\begin{align}
(\hat{{\beta}}-{\beta})^{T} {X}^{T} {X}(\hat{{\beta}}-{\beta})+\left({\beta}-{\mu}\right)^{T} {\Lambda}\left({\beta}-{\mu}\right)&= {\beta}^{T}  {X}^{T}  {X}  {\beta}+\hat{ {\beta}}^T  {X}^{T}  {X} \hat{ {\beta}}-2 \hat{ {\beta}}^T  {X}^{T}  {X}  {\beta}+ {\beta}^{T}  {\Lambda}  {\beta}+ {\mu}^{T}  {\Lambda}  {\mu}-2  {\mu}^{T}  {\Lambda}  {\beta}\\
&=\color{red}{\beta}^{T}  {X}^{T}  {X}  {\beta}\color{black} + \color{red}{\beta}^{T}  {\Lambda}  {\beta} \color{black}-\color{blue}2 \hat{ {\beta}}^T  {X}^{T}  {X}  {\beta}-\color{blue}2  {\mu}^{T}  {\Lambda}  {\beta}+\color{black} {\mu}^{T}  {\Lambda}  {\mu}+\hat{ {\beta}}^T  {X}^{T}  {X} \hat{ {\beta}}\\
&=\color{red}\beta^T\underbrace{(X^TX+\Lambda)}_{M}\beta\color{black}-\color{blue}2\underbrace{(\hat{\beta}^TX^TX+\mu^T\Lambda)}_{m^T}\beta+\color{black} {\mu}^{T}  {\Lambda}  {\mu}+\hat{ {\beta}}^T  {X}^{T}  {X} \hat{ {\beta}}\\
&=\underbrace{\color{red}\beta^TM\beta\color{black}-\color{blue}2m^T\beta}_{\text{needs completing the square}}+\color{black} {\mu}^{T}  {\Lambda}  {\mu}+\hat{ {\beta}}^T  {X}^{T}  {X} \hat{ {\beta}}\\
\end{align}

Details of completing the square
$$
\begin{align}
\color{red}\beta^TM\beta\color{black}-\color{blue}2m^T\beta\color{black}+\overbrace{\color{green}m^TM^{-1}m-m^TM^{-1}m}^{0}
&=\overbrace{\color{red}\beta^TM\beta\color{black}-\color{blue}m^T\beta\color{black}-\color{blue}\beta^Tm\color{black}+\color{green}m^TM^{-1}m}^{\text{complete the square}}-\color{green}m^TM^{-1}m\\
&=\color{Maroon}(\beta^TM-m^T)(\beta-M^{-1}m)\color{black}-\color{green}m^TM^{-1}m\\
&=\color{Maroon}(\beta^T-m^TM^{-1})M(\beta-M^{-1}m)\color{black}-\color{green}m^TM^{-1}m\\
&=\color{Maroon}(\beta-M^{-1}m)^TM(\beta-M^{-1}m)\color{black}-\color{green}m^TM^{-1}m
\end{align}
$$

Last step above made use of the fact $(M^{-1})^T=M^{-1}$.

<div style="background-color:LightCoral; color:DarkBlue; padding:30px;">
So this is the new expression we are working on
$$
(\hat{{\beta}}-{\beta})^{T} {X}^{T} {X}(\hat{{\beta}}-{\beta})+\left({\beta}-{\mu}\right)^{T} {\Lambda}\left({\beta}-{\mu}\right)=\color{Maroon}\underbrace{\color{Maroon}(\beta-M^{-1}m)^TM(\beta-M^{-1}m)}_{\text{Posterior normal kernal}}\color{black}-\color{green}m^TM^{-1}m+\color{black} {\mu}^{T}  {\Lambda}  {\mu}+\hat{ {\beta}}^T  {X}^{T}  {X} \hat{ {\beta}}
$$
</div>

So far we have achieved the kernal for the normal distribution part in posterior. And the rest will be shuffle into the inverse-gamma part.

Define the mean in the posterior kernel
$$
\begin{align}
\mu_{N} = M^{-1}m &= M^{-1}(X^TX\hat{\beta}+\Lambda^T\mu)\\
&=M^{-1}(X^TX(X^TX)^{-1}X^Xy+\Lambda^T\mu)\\
&=M^{-1}\underbrace{(X^Ty+\Lambda^T\mu)}_{m}
\end{align}
$$

<div style="background-color:LightCoral; color:DarkBlue; padding:30px;">
So this is the new expression we are working on
$$
(\hat{{\beta}}-{\beta})^{T} {X}^{T} {X}(\hat{{\beta}}-{\beta})+\left({\beta}-{\mu}\right)^{T} {\Lambda}\left({\beta}-{\mu}\right)=\color{Maroon}\underbrace{\color{Maroon}(\beta-\mu_N)^TM(\beta-\mu_N)}_{\text{Posterior normal kernal}}\color{black}-\color{green}m^TM^{-1}m+\color{black} {\mu}^{T}  {\Lambda}  {\mu}+\hat{ {\beta}}^T  {X}^{T}  {X} \hat{ {\beta}}
$$
    
</div>

$$
({y}-{X} {\beta})^{T}({y}-{X} {\beta})+\left({\beta}-{\mu}\right)^{T} {\Lambda}\left({\beta}-{\mu}\right)=({y}-{X} \hat{\beta})^{T}({y}-{X} \hat{\beta})+\color{Maroon}\underbrace{\color{Maroon}(\beta-\mu_N)^TM(\beta-\mu_N)}_{\text{Posterior normal kernal}}\color{black}-\color{green}m^TM^{-1}m+\color{black} {\mu}^{T}  {\Lambda}  {\mu}+\hat{ {\beta}}^T  {X}^{T}  {X} \hat{ {\beta}}
$$

Simplify each term on the right hand side


$$
({y}-{X} \hat{\beta})^{T}({y}-{X} \hat{\beta})+\color{Maroon}\underbrace{\color{Maroon}(\beta-\mu_N)^TM(\beta-\mu_N)}_{\text{Posterior normal kernal}}\color{black}-\color{green}m^TM^{-1}m+\color{black} {\mu}^{T}  {\Lambda}  {\mu}+\hat{ {\beta}}^T  {X}^{T}  {X} \hat{ {\beta}}
$$

$$
\begin{aligned}
( {y}- {X} \hat{ {\beta}})^{T}( {y}- {X} \hat{ {\beta}})&= {y}^{T}  {y}+\hat{ {\beta}}^{T}  {X}^{T}  {X} \hat{ {\beta}} -2  {y}^{T}  {X} \hat{ {\beta}} \\
&= {y}^{T}  {y}+\hat{ {\beta}}^{T}  {X}^{T}  {X} (\overbrace{X^TX)^{-1}X^Ty}^{\hat{\beta}} -2  {y}^{T}  {X} \hat{ {\beta}}  \\
&= {y}^{T}  {y}+\hat{ {\beta}}^{T}  X^Ty -2  {y}^{T}  {X} \hat{ {\beta}}  \\
&= {y}^{T}  {y}+(\overbrace{(X^TX)^{-1}X^Ty}^{\hat{\beta}})^{T}  X^Ty -2  {y}^{T}  {X} \hat{ {\beta}}  \\
&= y^Ty+y^TX(X^TX)^{-1}X^Ty - 2y^TX\hat{\beta}\\
&=  {y}^{T}  {y}+{y}^{T}  {X} \hat{ {\beta}}-2{y}^{T}  {X} \hat{ {\beta}}\\
&={y}^{T}  {y}-{y}^{T}  {X} \hat{ {\beta}}\\
\end{aligned}
$$

$$
m^TM^{-1}m = (X^Ty+\Lambda^T\mu)^TM^{-1}(X^Ty+\Lambda^T\mu)
$$

The purpose is to decompose the kernel of likelihood into two quadratic forms.

$$
\begin{gathered}
P(\beta \mid h)=(2 \pi)^{-\frac{k}{2}} h^{\frac{k}{2}}|V|^{-\frac{1}{2}} \exp \left[-\frac{h}{2}(\beta-\mu)^{\prime} V^{-1}(\beta-\mu)\right] \\
P(h)=\frac{1}{\left(\frac{2 m}{v}\right)^{v / 2}} \frac{1}{\Gamma\left(\frac{v}{2}\right)} h^{\frac{v-2}{2}} \exp \left[-\frac{h v}{2 m}\right]
\end{gathered}
$$