# Econ 240A - Part 2 - Problem Set 3
## William Radaic
## Fall 2025
## This version: 25 Nov 2025

# Problem 2 - Metropolis-Hastings

**2.1.** The Metropolis-Hastings (MH) algorithm is a computationally efficient procedure to approximate posterior distributions when exact computations are intractable. 
In the problem at hand, we have specified prior subjective uncertainty over the parameters $\phi$ and $\sigma^2$, and want to use the data to compute the posterior distribution of these parameters.
In order to construct this posterior distribution, we need to approximate more draws from this posterior, since the observed data corresponds to just a single draw.
The key reason why the procedure works is that the Markov Chain induced by the algorithm converges to a stationary distribution as $s$ grows---i.e., converges to the posterior distribution. Thus, we can approximate draws from the posterior and construct its distribution.
$\theta'$ is always accepted whenever $\alpha > 1$---i.e., whenever the likelihood ratio for the new parameter is larger than 1. This means that the new parameter $\theta'$ is an improvement (in terms of likelihood) relative to $\theta$. If $\alpha < 1$, the new parameter will be adopted with positive probability given by the CDF of a Uniform(0,1) evaluated at $\alpha$; that is, whenever $\theta'$ leads to a reduction in likelihood, it can still be adopted with some positive probability. The reasoning behind this decision rule is that it allows for the algorithm to visit all the parameter space and not get stuck in local minima.

**2.2.**

**(a)** 
$\kappa_\phi$ scales the variance of the distribution $\phi'\mid \theta$, from which new values of $\phi$ will be drawn. A higher value of $\kappa_\phi$ corresponds to a more dispersed distribution, which means draws $\phi'$ are less likely to be concentrated near the old parameter $\phi$.
$\kappa_\sigma$ scales the shape and the scale of the distribution of new values of $\sigma^{2\prime} \mid \theta$. Again, a higher value of $\kappa_\phi$ corresponds to a more dispersed distribution, which means draws $\sigma^{2\prime}$ are less likely to be concentrated near the old parameter $\sigma^2$. In other words, increasing $\kappa$ in the implementation of the algorithm leads to a more dispersed search process, visiting more regions of the parameter space with higher frequency. In terms of the Markov Chain, it means the transition matrix will be less sparse, in the sense of having more equally distributed values across the entire matrix. **[[[???]]]**

**(b)** 
Since $\phi'$ and $\sigma^{2\prime}$ are drawn indepdendently, the density $g(\theta' \mid \theta)$ is equal to 
$$ g(\theta' \mid \theta) = g(\phi' \mid \theta) g(\sigma^{2\prime} \mid \theta) $$
where $g(\phi' \mid \theta)$ and $g(\sigma^{2\prime} \mid \theta)$ are the pdfs associated with $\phi' \mid \theta$ and $\sigma^{2\prime} \mid \theta$ respectively.

Since $\phi'\mid \theta \sim \mathcal{N}(\phi, \kappa_\phi I_k)$, we can write for the case of the 4th and 5th draws:
$$g(\phi^{(5)} \mid \theta^{(4)}) = (2 \pi)^{-k / 2} \kappa_\phi^{-k / 2} \exp \left(-\frac{1}{2}(\phi^{(5)}-\phi^{(4)})^{\mathrm{T}} \kappa_\phi^{-1} (\phi^{(5)}-\phi^{(4)})\right).$$

Similarly, for $\sigma^{2\prime} \mid \theta$ in the case at hand we have:
$$
g(\sigma^{2(5)} \mid \theta^{(4)}) 
= \frac{
\kappa_\sigma^{\,\kappa_\sigma \sigma^{2(4)}}
}{
\Gamma(\kappa_\sigma \sigma^{2(4)})
}
\,
(\sigma^{2(5)})^{\kappa_\sigma \sigma^{2(4)} - 1}
\exp\!\left( -\kappa_\sigma \sigma^{2(5)} \right).$$




Multiplying both expressions above gives us the final expression for $g(\theta' \mid \theta)$:
$$g(\theta^{(5)} \mid \theta^{(4)}) = (2 \pi)^{-k / 2} \kappa_\phi^{-k / 2} \exp \left(-\frac{1}{2}(\phi^{(5)}-\phi^{(4)})^{\mathrm{T}} \kappa_\phi^{-1} (\phi^{(5)}-\phi^{(4)})\right) \frac{
\kappa_\sigma^{\,\kappa_\sigma \sigma^{2(4)}}
}{
\Gamma(\kappa_\sigma \sigma^{2(4)})
}
\,
(\sigma^{2(5)})^{\kappa_\sigma \sigma^{2(4)} - 1}
\exp\!\left( -\kappa_\sigma \sigma^{2(5)} \right)$$


**(c)** 
Since the normal pdf is symmetric, the terms related to $\phi \mid \theta$ cancel out. Thus, this correction term is just a ratio of the Gamma distributions. After algebra, we get:

$$\frac{g(\theta \mid \theta')}{g(\theta' \mid \theta)}
=
\frac{\Gamma\!\big(\kappa_\sigma \sigma^{2(4)}\big)}
     {\Gamma\!\big(\kappa_\sigma \sigma^{2(5)}\big)}
\,
\kappa_\sigma^{\,\kappa_\sigma \big(\sigma^{2(5)} - \sigma^{2(4)}\big)}
\,
\frac{
\big(\sigma^{2(4)}\big)^{\kappa_\sigma \sigma^{2(5)} - 1}
}{
\big(\sigma^{2(5)}\big)^{\kappa_\sigma \sigma^{2(4)} - 1}
}
\,
\exp\!\left[-\kappa_\sigma\big(\sigma^{2(4)} - \sigma^{2(5)}\big)\right].$$

**(d)** 
We start by factoring the posterior ratio as 
$$
\frac{\pi(\theta' \mid Y)}{\pi(\theta \mid Y)}
=
\frac{
p(Y \mid \phi',\sigma'^2)\,p(\phi' \mid \sigma'^2)\,p(\sigma'^2)
}{
p(Y \mid \phi,\sigma^2)\,p(\phi \mid \sigma^2)\,p(\sigma^2)
},$$
where the first term is the likelihood ratio; the second term is the (Normal) prior on $\phi$, and the third term is the Inverse-Gamma prior on $\sigma^2$.

Given that these are ratios, the constant terms in each of the pdfs cancel out. Thus, after algebra, we have:
$$
\frac{\pi(\theta' \mid Y)}{\pi(\theta \mid Y)}=
\left(\frac{\sigma'^2}{\sigma^2}\right)^{-\frac{n+K}{2}}
\left(\frac{\sigma^2}{\sigma'^2}\right)^{\alpha+1}
\exp\!\left[
  -\frac{1}{2\sigma'^2}\big(\|Y - X\phi'\|^2 + (\phi' - \bar\phi)'\Lambda(\phi' - \bar\phi)\big)
  +\frac{1}{2\sigma^2}\big(\|Y - X\phi\|^2 + (\phi - \bar\phi)'\Lambda(\phi - \bar\phi)\big)
  -\frac{\beta}{\sigma'^2} + \frac{\beta}{\sigma^2}
\right]
$$
where $(\alpha = \nu_0/2,\ \beta = \nu_0 s_0^2/2)$.
Interpreting the expression above: $\|Y - X\phi'\|^2$ and $\|Y - X\phi\|^2$ come from the LR (with no Var-Cov matrix since we assume homoskedasticity); $(\phi' - \bar\phi)'\Lambda(\phi' - \bar\phi)\big)$ come from the prior on $\phi$, with associated Var-Cov matrix $\Lambda$; and the $\beta$ fractions come from the Inverse-Gamma prior.