Utils: https://www.codecogs.com/latex/eqneditor.php


**Problem 4.5:** The denominator of Bayes' rule:  
  $Pr(x_1..x_n) = \int\prod_{i=1}^{I} Pr(x_i|\theta)Pr(\theta)d\theta$.   
is known as the *evidence*. It is a measure of how well the distribution fits *regardless* of the particular values of the parameters. Find an expression for the evidence tern for (i) the normal distribution and (ii) the categorical distribution assumming conjugate priors in each case.


**Normal distribution**  
$Pr(x_1..x_n) = \int\int\prod_{i=1}^{I} Pr(x_i|\mu,\sigma^2)Pr(\mu,\sigma^2)d\mu d\sigma^2$  
$ = \int\int\prod_{i=1}^{I} Norm_{x_i}(\mu,\sigma^2)NormInvGam_{\mu,\sigma^2}[\alpha,\beta,\gamma,\delta]d\mu d\sigma^2$  

where:
$Pr(\mu,\sigma^2)d\mu d\sigma^2 = NormInvGam_{\mu,\sigma^2}[\alpha,\beta,\gamma,\delta]  
=\frac{\sqrt{\gamma}}{\sigma \sqrt{2\pi}}\frac{\beta^\alpha}{\Gamma(\alpha)}(\frac{1}{\sigma^2})^{\alpha+1}exp[-\frac{2\beta+\gamma(\delta-\mu)^2}{2\sigma^2}]$


$Pr(x_1..x_n) = \int\int\prod_{i=1}^{I} Pr(x_i|\mu,\sigma^2)Pr(\mu,\sigma^2)d\mu d\sigma^2 = \int\int NormInvGam_{\mu,\sigma^2}[\alpha,\beta,\gamma,\delta]d\mu d\sigma^2 = \int\int \kappa NormInvGam_{\mu,\sigma^2}[\tilde{\alpha},\tilde{\beta},\tilde{\gamma},\tilde{\delta}]d\mu d\sigma^2 = \kappa, $  
Because the intergral of distribution function is one.  
=> $Pr(x_1..x_n) = \kappa $  

Where:  
$\kappa = \frac{1}{(2\pi)^{I/2}}\frac{\sqrt{\gamma}\beta^\alpha}{\sqrt{\tilde{\gamma}}\tilde{\beta}^\tilde{\alpha}}$  
$\tilde{\alpha} = \alpha + I/2$  
$\tilde{\beta} = \frac{\sum_{i}x_i^2}{2}+\beta+\frac{\gamma\delta^2}{2} - \frac{(\gamma\delta + \sum_{i}x_i)^2}{2(\gamma+I)}$  
$\tilde{\gamma} = \gamma + I$




***Problem 4.6:*** The evidence term can be used to compare models. Consider two sets of data $S_1=$ {0.1,-0.5,0.2,0.7} and $S_2=$ {1.1,2.0,1.4,2.3}. Let us pose the question of whether these two data sets came from the same normal distribution or from two different normal distributions.  
Let model $M_1$ denote the case where all of the data comes from the one normal distribution. The evidence for this model is:  

$Pr(S_1\cup S_2 | M_1) = \int \prod_{i\in S_1\cup S_2} Pr(x_i|\theta)Pr(\theta)d\theta$,  
where $\theta =$ {$\mu, \sigma^2$} contain parameters of this normal distribution. Similarly, we will let $M_2$ denote the case where the two sets of data belong to different normal distributions.  

$Pr(S_1\cup S_2 | M_2) = \int \prod_{i\in S_1} Pr(x_i|\theta_1)Pr(\theta_1)d\theta_1 \prod_{i\in S_2} Pr(x_i|\theta_2)Pr(\theta_2)d\theta_2$,  
where $\theta_1 =$ {$\mu_1, \sigma_1^2$}$, \theta_2 =$ {$\mu_2, \sigma_2^2$}

Now it is possible to compare the probability of the data under each of these two models using Bayes' rule  
$Pr(M_1|S_1\cup S_2) = \frac{Pr(S_1 \cup S_2| M_1)Pr(M_1)}{\sum_{n=1}^{2}Pr(S_1 \cup S_2 |M_n)Pr(M_n)}$

In [1]:
import matplotlib.pyplot as plt
import numpy as np
import math
from scipy import stats

In [2]:
S1 = np.array([0.1, -0.5, 0.2, 0.7])
S2 = np.array([1.1, 2.0, 1.4, 2.3])

In [3]:
def evidentNorm(alpha, beta, gamma, delta, x):
    I = x.size
    alpha_ = alpha + I / 2
    beta_ = sum(pow(x,2))/2+beta + gamma*pow(delta,2)/2 - pow(gamma*delta+sum(x),2) / (2*(gamma+I))
    gamma_ = gamma + I
    
    upVal = math.sqrt(gamma) * pow(beta,alpha)
    lowVal = pow(2*math.pi, I/2) * math.sqrt(gamma_) * pow(beta_,alpha_)
    k = upVal / lowVal
    return k

In [5]:
Pr_S1 = evidentNorm(1,1,1,0,S1)  
Pr_S2 = evidentNorm(1,1,1,0,S2)  
Pr_S1S2 = evidentNorm(1,1,1,0, np.concatenate((S1,S2), axis =None))  
print('The resulting probabilities are:')  
print('Pr(S1) = ',Pr_S1)  
print('Pr(S2) = ', Pr_S2)  
print('Pr(S1 V S2) = ', Pr_S1S2)



The resulting probabilities are:
Pr(S1) =  0.004405483303634447
Pr(S2) =  0.0006400767127359544
Pr(S1 V S2) =  9.686338632203578e-08


The posterior probability if the two sets of data coming from the same normal is

$Pr(M_1|S_1\cup S_2) = \frac{Pr(S_1 \cup S_2| M_1)Pr(M_1)}{\sum_{n=1}^{2}Pr(S_1 \cup S_2 |M_n)Pr(M_n)}$
$= \frac{Pr(S_1\cup S_2)}{Pr(S_1 \cup S_2) + Pr(S_1)Pr(S_2) }$

In [6]:
result = Pr_S1S2 / (Pr_S1S2 + Pr_S1*Pr_S2)
print('Result = ', result)

Result =  0.03320980299460188


***Problem 5.4:*** The Schur complement identity states that inverse of a matrix in terms of its sub-blocks is  
$\begin{bmatrix}A&B \\ C&D \end{bmatrix}^{-1}=\begin{bmatrix}(A-BD^{-1}C)^{-1}&-(A-BD^{-1}C)BD^{-1} \\ -D^{-1}C(A-BD^{-1}C)^{-1}&D^{-1}+D^{-1}C(A-BD^{-1}C)^{-1}BD^{-1} \end{bmatrix}$  
Prove the conditional distribution property for the normal distribution: if  
    $Pr(x)=Pr\left(\begin{bmatrix}x_1\\ x_2\end{bmatrix}\right) = Norm_x\left[\begin{bmatrix}\mu_1\\ \mu_2\end{bmatrix}, \begin{bmatrix}\sum_{11}&\sum_{12}^T\\ \sum_{12}&\sum_{22}\end{bmatrix} \right],$  

then  

$Pr(x_1|x_2) = Norm_{x_1}\left[\mu_1+\sum_{12}^T\sum_{22}^{-1}(x_2-\mu_2),\sum_{11}-\sum_{12}^T\sum_{22}^{-1}\sum_{12}\right]$


Solved by proving:  

$\begin{bmatrix}A&B \\ C&D \end{bmatrix} * \begin{bmatrix}(A-BD^{-1}C)^{-1}&-(A-BD^{-1}C)BD^{-1} \\ -D^{-1}C(A-BD^{-1}C)^{-1}&D^{-1}+D^{-1}C(A-BD^{-1}C)^{-1}BD^{-1} \end{bmatrix}  = I$

***Problem 5.5:*** Prove Schur complement identity states that inverse of a matrix in terms of its sub-blocks is:

$Pr(x) = Pr\left (\begin{bmatrix}x_1 \\ x_2 \end{bmatrix} \right ) = Norm_x \left[\begin{bmatrix}\mu_1 \\ \mu_2 \end{bmatrix}, \begin{bmatrix} \sum_{11}&\sum_{12}^T \\ \sum_{12}&\sum_{22} \end{bmatrix} \right] $  

Then:  
$Pr(x_1|x_2) = Norm_{x_1}\left[ \mu_1 + \sum_{12}^T\sum_{22}^{-1}(x_2-\mu_2),\sum_{12} - \sum_{12}^T \sum_{22}^{-1}\sum_{12} \right]$

**Solution:**  
Apply Schur's complement: