# **2 Topic Modeling with EM (Ed Tam)**

**2.1 Derive Log-Likelihood**

Let's right the likelihood for this problem:

$$L(\theta)=P(\bar w=\bar{q}, \bar d | \bar \alpha, \bar \beta)=P(\bar w=\bar{q}, \bar d | \theta),$$

where I used the notation $\theta = \bar \alpha, \bar \beta$. Since the probability of a word $w_n$
 appears in the document $i$ is $p_{ni}$, we can say, that
the probability to appear $q(w_n; d_i)$ times is $p_{ni}^q(w_n; d_i)$. In that case, we can write the likelihood as follow:

\begin{eqnarray}
L(\theta)=\prod_{i=1}^M\prod_{n=1}^N P(w_i=q(w_n; d_i)|\theta_{i,n})=\prod_{i=1}^M\prod_{n=1}^N p_{ni}^{q(w_n; d_i)}=\\
=\prod_{i=1}^M\prod_{n=1}^N (\sum_{k=1}^K\beta_{kn}\alpha_{ik})^{q(w_n; d_i)},
\end{eqnarray}

Log-likelihood then

\begin{eqnarray}
\log L(\theta)=\log \prod_{i=1}^M\prod_{n=1}^N (\sum_{k=1}^K\beta_{kn}\alpha_{ik})^{q(w_n; d_i)}=\\
=\sum_{i=1}^M\sum_{n=1}^N \log (\sum_{k=1}^K\beta_{kn}\alpha_{ik})^{q(w_n; d_i)}=\sum_{i=1}^M\sum_{n=1}^Nq(w_n; d_i)\log \sum_{k=1}^K\beta_{kn}\alpha_{ik},
\end{eqnarray}

So

$$\log L(\theta)=\sum_{i=1}^M\sum_{n=1}^Nq(w_n; d_i)\log \sum_{k=1}^K\beta_{kn}\alpha_{ik}.$$

**2.2 E Step**

Using Bayes rule we can write

\begin{eqnarray}
p(z_k|w_n,d_i,\alpha^{old},\beta^{old})=\dfrac{p(w_n|z_k,d_i,\alpha^{old},\beta^{old})p(z_k|d_i,\alpha^{old},\beta^{old})}{p(w_n|d_i,\alpha^{old},\beta^{old})},
\end{eqnarray}

Now, since the appearance of the word $w_n$ in the topic $z_k$ is independent of the book $d_i$, we can use

$$p(w_n|z_k,d_i,\alpha^{old},\beta^{old})=p(w_n|z_k,\alpha^{old},\beta^{old})=\beta^{old}_{nk}.$$

Moreover, by definition $p(z_k|d_i,\alpha^{old},\beta^{old})=\alpha^{old}_{ki}$, and

\begin{eqnarray}
p(w_n|d_i,\alpha^{old},\beta^{old})=\sum_{l=1}^{K}p(w_n|z_l,\alpha^{old},\beta^{old})p(z_l|d_i,\alpha^{old},\beta^{old})=\\
=\sum_{l=1}^{K}\beta^{old}_{nl}\alpha^{old}_{li},
\end{eqnarray}

and thus

$$p(z_k|w_n,d_i,\alpha^{old},\beta^{old})=\dfrac{\beta^{old}_{nk}\alpha^{old}_{ki}}{\sum_{l=1}^{K}\beta^{old}_{nl}\alpha^{old}_{li}}.$$

**2.3 Find ELBO for M-Step**

\begin{eqnarray}
\log L(\theta)=\sum_{i=1}^M\sum_{n=1}^Nq(w_n; d_i)\log \sum_{k=1}^K\beta_{kn}\alpha_{ik}.
\end{eqnarray}

We can write $\beta_{kn}\alpha_{ik}$ as

$$\beta_{kn}\alpha_{ik}=\beta_{kn}\alpha_{ik}\dfrac{p(z_k|w_n,d_i,\alpha^{old},\beta^{old})}{p(z_k|w_n,d_i,\alpha^{old},\beta^{old})}$$

Then

\begin{eqnarray}
\log L(\theta)=\sum_{i=1}^M\sum_{n=1}^Nq(w_n; d_i)\log \sum_{k=1}^Kp(z_k|w_n,d_i,\alpha^{old},\beta^{old})\dfrac{\beta_{kn}\alpha_{ik}}{p(z_k|w_n,d_i,\alpha^{old},\beta^{old})}.
\end{eqnarray}

So $\sum_k$ gives us the expectation. In that case we can use Jensen's inequality (since $-\log$ is convex). Applying
the notation $p(z_k|w_n,d_i,\alpha^{old},\beta^{old})=\gamma_{ink}$, we have

\begin{eqnarray}
\sum_{i=1}^M\sum_{n=1}^Nq(w_n; d_i)\log \sum_{k=1}^Kp(z_k|w_n,d_i,\alpha^{old},\beta^{old})\dfrac{\beta_{kn}\alpha_{ik}}{p(z_k|w_n,d_i,\alpha^{old},\beta^{old})}\ge\\
\sum_{i=1}^M\sum_{n=1}^Nq(w_n; d_i)\sum_{k=1}^K\gamma_{ink}\log \dfrac{\beta_{kn}\alpha_{ik}}{\gamma_{ink}},
\end{eqnarray}

Thus,

\begin{eqnarray}
A(\alpha,\beta)=\sum_{i=1}^M\sum_{n=1}^N\sum_{k=1}^Kq(w_n; d_i)\gamma_{ink}\log \dfrac{\beta_{kn}\alpha_{ik}}{\gamma_{ink}}.
\end{eqnarray}

**2.4  M-Step**

**$\alpha^{new}$**

To find the optimal values of $\alpha$ and $\beta$ that maximizes the ELBo with constrains
 $\sum_{k=1}^K\alpha^{new}_{ik}=1$ and $\sum_{n=1}^N\beta^{new}_{kn}=1$, let's write
 the Lagrangian with Lagrangian multipliers:

 $$L(\alpha,\beta)=A(\alpha,\beta)+\gamma_1(1-\sum_{k=1}^K\alpha_{ik})+\gamma_2(1-\sum_{n=1}^N\beta_{kn}),$$

 where the superscript ${new}$ is omitted for brevity.

\begin{eqnarray}
\dfrac{\partial L(\alpha,\beta)}{\partial \alpha_{ik}}=\dfrac{\partial A(\alpha,\beta)}{\partial \alpha_{ik}}-\gamma_1=\dfrac{\partial}{\partial\alpha_{ik}}\sum_{i=1}^M\sum_{n=1}^N\sum_{k=1}^Kq(w_n; d_i)\gamma_{ink}\log \dfrac{\beta_{kn}\alpha_{ik}}{\gamma_{ink}}-\gamma_1=\\
=\dfrac{\partial}{\partial\alpha_{ik}}\sum_{n=1}^Nq(w_n; d_i)\gamma_{ink}\left(\log(\beta_{kn})+\log(\alpha_{ik})-\log(\gamma_{ink})\right)-\gamma_1=\\
=\sum_{n=1}^Nq(w_n; d_i)\gamma_{ink}\dfrac{1}{\alpha_{ik}}-\gamma_1=0.
\end{eqnarray}

Thus,

$$\alpha_{ik}=\dfrac{\sum_{n=1}^Nq(w_n; d_i)\gamma_{ink}}{\gamma_1}.$$

Using the constrain $\sum_{k=1}^K\alpha_{ik}=1$:


$$\sum_{k=1}^K\dfrac{\sum_{n=1}^Nq(w_n; d_i)\gamma_{ink}}{\gamma_1}=1,$$

$$\sum_{n=1}^Nq(w_n; d_i)\sum_{k=1}^K\gamma_{ink}=\gamma_1,$$

and since $\gamma_{ink}=p(z_k|w_n,d_i,\alpha^{old}\beta^{old})$, its sum is equal to 1 (it's a probability
after all), we have

$$\gamma_1=\sum_{n=1}^Nq(w_n; d_i),$$

and

$$\alpha_{ik}^{new}=\dfrac{\sum_{n=1}^Nq(w_n; d_i)\gamma_{ink}}{\sum_{n=1}^Nq(w_n; d_i)},$$

where

$$\gamma_{ink}=\dfrac{\beta^{old}_{nk}\alpha^{old}_{ki}}{\sum_{l=1}^{K}\beta^{old}_{nl}\alpha^{old}_{li}}.$$


Since the substitutions $\alpha \leftrightarrow \beta$ and $i \leftrightarrow n$ we are
getting the same equation, we can skip the derivation of $\beta_{kn}^{new}$ and write
it by analogy with $\alpha_{ik}^{new}$, since it will be the same:

$$\beta_{kn}=\dfrac{\sum_{i=1}^Mq(w_n; d_i)\gamma_{ink}}{\gamma_2}.$$

To find $\gamma_2$ let's use the constrain $\sum_{n=1}^N\beta_{kn}=1$.

$$\sum_{n=1}^N\dfrac{\sum_{i=1}^Mq(w_n; d_i)\gamma_{ink}}{\gamma_2}=1,$$

$$\sum_{n=1}^N\sum_{i=1}^Mq(w_n; d_i)\gamma_{ink}=\gamma_2,$$

Since I don't see how else I can simplify this equation, the final expression for
$\beta_{kn}^{new}$ is

$$\beta_{kn}=\dfrac{\sum_{i=1}^Mq(w_n; d_i)\gamma_{ink}}{\sum_{n=1}^N\sum_{i=1}^Mq(w_n; d_i)\gamma_{ink}}.$$




