# Expectation-Maximization Algorithm

## 1 General EM

EM - general method for finding maximum likelihood solutions in probabilistic models with latent variables.  

Let $\theta$ - parameters.  

Goal is to find MLE
    $$ \log p(X, \theta)  \rightarrow \max_{\theta} $$

Introduce latent variables $Z$ and find ELBO
$$\log p(X | \theta) = \log \sum_Z p(X,Z | \theta) = \log \sum_Z p(X,Z | \theta) \frac {q(Z)} {q(Z)}$$

$$\log p(X | \theta) = \log E_{q(Z)} \biggr[ \frac {p(X,Z | \theta)} {q(Z)} \biggr] =  E_{q(Z)} \biggr[ \log \frac {p(X,Z | \theta)} {q(Z)} \biggr] + R$$


$$R = \log p(X | \theta) - E_{q(Z)} \biggr[ \log \frac {p(X,Z | \theta)} {q(Z)} \biggr] = \log p(X | \theta) - E_{q(Z)} \bigr[ \log p(X,Z | \theta) - \log q(Z) \bigr]$$

$$ R = \log p(X | \theta) - E_{q(Z)} \bigr[ \log p(Z|X,  \theta) - \log p(X| \theta) - \log q(Z) \bigr] = E_{q(Z)} \bigr[ - \log p(Z|X,  \theta) + \log q(Z) \bigr] $$

$$ R = \sum_Z q(Z) \log \frac { q(Z)} {p(Z|X,  \theta)} $$

$$\log p(X | \theta) = E_{q(Z)} \biggr[ \log \frac {p(X,Z | \theta)} {q(Z)} \biggr] + KL(q(Z) || p(Z|X, \theta)) $$

ELBO: 

$$L(q(Z), \theta) = E_{q(Z)} \biggr[ \log \frac {p(X,Z | \theta)} {q(Z)} \biggr]$$


**E-step**

$\theta = const$  and minimize KL(q||p).  
Find 
$$q(Z) = p(Z | X, \theta_{t})$$

**M-step**

$$L(q, \theta) \rightarrow \max_{\theta}$$

$$ E_{q(Z)} \bigr[ \log p(X,Z | \theta) - \log q(Z) \bigr] \rightarrow \max_{\theta} $$

$$ E_{q(Z)} \log p(X,Z | \theta) \rightarrow \max_{\theta} $$

## 2 Gaussian Mixture Model

probability of a single object
$$ p(x_n) = \sum_{k=1}^K \pi_k N(x_n | \mu_k, \Sigma_k)$$

$z_k \in \{0,1\}$ - latent variable  
$\sum_k z_k = 1$   
$\pi_k = P(z_k = 1)$  
$\sum_k \pi_k = 1$  


$$ p(x, z | \theta) = \prod_{n=1}^N \prod_{k=1}^K \bigr[ \pi_k N(x_n | \mu_k, \Sigma_k) \bigr]^{z_{nk}}$$

**E-step**

$$ p(z | x, \theta) = \prod_{n=1}^N \frac {\prod_{k=1}^K \bigr[ \pi_k N(x_n | \mu_k, \Sigma_k) \bigr]^{z_{nk}}} {\sum_{s=1}^K \pi_s N(x_n | \mu_s, \Sigma_s)}$$

$$ \frac {\prod_{k=1}^K \bigr[ \pi_k N(x_n | \mu_k, \Sigma_k) \bigr]^{z_{nk}}} {\sum_{s=1}^K \pi_s N(x_n | \mu_s, \Sigma_s)} = \gamma_{nk} = p(z_k = 1 | x_n, \theta)$$

$$ p(z | x, \theta) = \prod_{n=1}^N \prod_{k=1}^K \gamma_{nk} $$

**M-step**

$$E_{p(z|x, \theta)} \log p(x,z| \theta) = E_{p(z|x, \theta)} \sum_{n=1}^N \sum_{k=1}^K z_{nk} \bigr( \log \pi_k + \log N(x_n | \mu_k, \Sigma_k) \bigr) = $$

$$ = \sum_{n=1}^N \sum_{k=1}^K z_{nk} E_{p(z|x, \theta)} [z_{nk}] \bigr( \log \pi_k + \log N(x_n | \mu_k, \Sigma_k) \bigr)  = \sum_{n=1}^N \sum_{k=1}^K z_{nk} \gamma_{nk} \bigr( \log \pi_k + \log N(x_n | \mu_k, \Sigma_k) \bigr) \rightarrow \max_{\pi_k, \mu_k, \Sigma_k}$$


$$ F = \sum_{n=1}^N \sum_{k=1}^K \gamma_{nk} \bigr( \log \pi_k + \log N(x_n | \mu_k, \Sigma_k) \bigr)  + \lambda \bigr( \sum_k \pi_k - 1\bigr)$$


$\pi_k$:

$$ \frac {\partial F} {\partial \pi_k} = \sum_{n=1}^N \gamma_{nk} \frac 1 {\pi_k} + \lambda = 0 $$

$$\pi_k = \frac 1 {\lambda} \sum_{n=1}^N \gamma_{nk} $$

$$ \lambda = N $$

$$\pi_k = \frac 1 {N} \sum_{n=1}^N \gamma_{nk} $$

$\mu_k$:

$$N(x_n | \mu_k, \Sigma_k) = (2 \pi)^{-d/2} \det(\Sigma_k)^{-1/2} \exp \bigr( - \frac 1 2 (x_n-\mu_k)^T \Sigma_k^{-1} (x_n-\mu_k) \bigr)$$

$$\log N(x_n | \mu_k, \Sigma_k) = - \frac d 2 (2/\pi) \frac 1 2 \log \det(\Sigma_k^{-1}) - \frac 1 2 (x_n-\mu_k)^T \Sigma_k^{-1} (x_n-\mu_k) $$

$$ \frac {\partial F} {\partial \mu_k} = \sum_{n=1}^N \gamma_{nk} \Sigma^{-1} (x_n-\mu_k) = 0   $$

$$ \sum_{n=1}^N \gamma_{nk} (x_n-\mu_k) = 0 $$

$$ \mu_k = \frac 1 {\sum_{n=1}^N \gamma_{nk}} \sum_{n=1}^N \gamma_{nk} x_n  = \frac 1 {N \pi_k} \sum_{n=1}^N \gamma_{nk} x_n$$


$\Sigma_k$:

Let $\Lambda_k = \Sigma_k^{-1}$

$$ F \sim \sum_{n=1}^N \sum_{k=1}^K \gamma_{nk} \bigr( \frac 1 2 \log \det(\Lambda_k) - \frac 1 2 (x_n-\mu_k)^T \Lambda_k (x_n-\mu_k) \bigr)$$


$$ \frac {\partial \det X} {\partial X_{ij}} = \sum_{k=1}^n \frac {X_{ik}} {X_{ij}} C_{ik}+ \frac {C_{ik}} {X_{ij}} X_{ik} = C_{ik}$$

where $C_{ij} = (-1)^{i+j} M_{ij}$

$$ \frac {\partial \log \det X} {\partial X_{ij}} = \frac 1 {det X} \frac {\partial \det X} {\partial X_{ij}}  = \frac 1 {det X} C_{ij} = (X^{-1})^T_{ij}$$


$$ \frac {\partial F} {\partial \Lambda_k} = \sum_{n=1}^N \gamma_{nk} \bigr(-\frac 1 2 (x_n-\mu_k) (x_n-\mu_k)^T + \frac 1 2 \Lambda_k^{-1} \bigr) = 0 $$

$$ \Sigma_k = \Lambda_k^{-1} = \frac 1 {\sum_{n=1}^N \gamma_{nk}} \sum_{n=1}^N \gamma_{nk} (x_n-\mu_k) (x_n-\mu_k)^T = \frac 1 {N \pi_k} \sum_{n=1}^N \gamma_{nk} (x_n-\mu_k) (x_n-\mu_k)^T $$

