Given the model setup:

A paper's true value:

$$y^{(pr)} \sim \mathcal{N}(\mu_p, \sigma_p^2)$$

A reviewer's bias:

$$z^{(pr)} \sim \mathcal{N}(\nu_r, \tau_r^2)$$

A reviewer's score for a given paper:

$$x^{(pr)} | y^{(pr)}, z^{(pr)} \sim \mathcal{N}(y^{(pr)} + z^{(pr)}, \sigma^2 )$$

**Independency**: The variables $y^{(pr)}$ and $z^{(pr)}$ are independent; the variables $(x, y, z)$ for different paper-reviewer pairs are also jointly independent.

# (a) E-step

#### (i)

Hint from the definition of the problem:

$$x^{(pr)} = y^{(pr)} + z^{(pr)} + \epsilon^{(pr)} \text{,  where  } \epsilon \sim \mathcal{N}(0, \sigma^2)$$

so $x^{(pr)}$ follows [a normal distribution that is the sum of multiple independent noraml distributions](https://en.wikipedia.org/wiki/Sum_of_normally_distributed_random_variables):

$$x^{(pr)} \sim \mathcal{N}(\mu_p + \nu_r, \sigma_p^2 + \tau_r^2 + \sigma^2 )$$

For the joint distribution $p(y^{(pr)}, z^{(pr)}, x^{(pr)})$, its mean vector ($mu_{pr}$, use $m$ because $\mu$ has already been used) and covariance matrix ($\Sigma_{pr}$) are

\begin{align*}
m_{pr} &= [\mu_p, \nu_r, \mu_p + \nu_r]^T
\\
\Sigma_{pr} &= 
\begin{bmatrix}
\sigma_p^2 & 0        & \sigma_p^2 \\ 
0          & \tau_r^2 & \tau_r^2  \\ 
\sigma_p^2 & \tau_r^2 & \sigma_p^2 + \tau_r^2 + \sigma^2
\end{bmatrix}
\end{align*}

Note: $\mathrm{cov}(y^{(pr)}, x^{(pr)})$ and $\mathrm{cov}(z^{(pr)}, x^{(pr)})$ are derived based on the following. If normally distributed random variables $A$ and $B$ are independent, then $\mathrm{cov}(A, A+B) = \mathrm{cov}(A, A) = \sigma_A$.

Proof:

\begin{align*}
\mathrm{cov}(A, A+B) 
&= E[(A - E[A])(A+B - E[A+B])] \\
&= E[(A - E[A])(A - E[A] + B - E[B])] \\
&= E[(A - E[A])(A - E[A]) + (A - E[A])(B - E[B]))] \\
&= E[(A - E[A])(A - E[A])] \\
&= \sigma_A^2 \\
\end{align*}

Therefore, following a trivariate normal distribution,

\begin{align*}
p(x^{(pr)}, y^{(pr)}, z^{(pr)}; \mu_p, \nu_r, \sigma_p^2, \tau_r^2)
&= \frac{1}{(2\pi)^{2/3} \left|\Sigma_{pr}\right|^{1/2}} \exp \bigg(\frac{1}{2}(a^{(pr)} - m_{pr})^T \Sigma_{pr}^{-1} (a^{(pr)} - m_{pr}) \bigg)
\end{align*}

where $a^{(pr)} = [y^{(pr)}, z^{(pr)}, x^{(pr)}]^T$.

#### (ii)

About conditional multivariate normal distribution from the lecture notes on factor analysis, http://cs229.stanford.edu/notes/cs229-notes9.pdf.

Let $x_1$ and $x_2$ be two multivariate normal random variables,

\begin{align*}
x_1 &\sim \mathcal{N}(\mu_1, \Sigma_{11}) \\
x_2 &\sim \mathcal{N}(\mu_2, \Sigma_{22})
\end{align*}

Then, let $x$ be a new multivariate random variable after stacking $x_1$ and $x_2$, 

\begin{align*}
x &= [x_1, x_2]^T \sim \mathcal{N}([\mu_1, \mu_2]^T, \Sigma)
\\
\Sigma &= 
\begin{bmatrix}
 \Sigma_{11} & \Sigma_{12} \\ 
 \Sigma_{21} & \Sigma_{22} 
\end{bmatrix}
\end{align*}

Note $x_1$, $x_2$, $\mu_1$, $\mu_2$, and $\Sigma_{ij}$ can not only be scalars, but also vectors/submatrices.

Then for conditional random variable, $x_1|x_2 \sim \mathcal{N}(\mu_{1|2}, \Sigma_{1|2})$,

\begin{align*}
\mu_{1|2}    &= \mu_1 + \Sigma_{12}\Sigma_{22}^{-1}(x_2 - \mu_2) \\
\Sigma_{1|2} &= \Sigma_{11} + \Sigma_{12}\Sigma_{22}^{-1}\Sigma_{21}
\end{align*}

Now we derive the expression for $Q_{pr}(y^{(pr)}, z^{(pr)}) = p(y^{(pr)}, z^{(pr)} | x^{(pr)})$

First, figure out all the corresponding variables

\begin{align*}
\mu_1 &= [\mu_p, \nu_r]^T
\\
\Sigma_{12} &= [\sigma_p^2, \tau^2]^T \\
\Sigma_{22}^{-1} &= \frac{1}{\sigma_p^2 + \tau_r^2 + \sigma^2} \\
x_2 &= x^{(pr)} \\
\mu_2 &= \mu_p + \nu_r \\
\Sigma_{11} &= \begin{bmatrix}
\sigma_p^2 & 0        \\ 
0          & \tau_r^2 \\ 
\end{bmatrix} \\
\Sigma_{21} &= [\sigma_p^2, \tau^2]
\end{align*}

So

\begin{align*}
\mu_{1|2} &=
\begin{bmatrix}
\mu_p \\
\nu_r \\
\end{bmatrix} + \frac{x^{(pr)} - \mu_p - \nu_r}{\sigma_p^2 + \tau_r^2 + \sigma^2} 
\begin{bmatrix}
\sigma_p^2 \\
\tau^2 \\
\end{bmatrix}
\end{align*}

\begin{align*}
\Sigma_{1|2} &= \begin{bmatrix}
\sigma_p^2 & 0        \\ 
0          & \tau_r^2 \\ 
\end{bmatrix} - \begin{bmatrix}
\sigma_p^2 \\
\tau^2 \\
\end{bmatrix} \frac{1}{\sigma_p^2 + \tau_r^2 + \sigma^2} [\sigma_p^2, \tau^2] 
\\
&= \begin{bmatrix}
\sigma_p^2 & 0        \\ 
0          & \tau_r^2 \\ 
\end{bmatrix} - \frac{1}{\sigma_p^2 + \tau_r^2 + \sigma^2} \begin{bmatrix}
\sigma_p^4        & \sigma_p^2 \tau^2 \\
\tau^2 \sigma_p^2 & \tau^4            \\
\end{bmatrix} 
\end{align*}

\begin{align*}
Q_{pr}(y^{(pr)}, z^{(pr)})
&= p(y^{(pr)}, z^{(pr)} | x^{(pr)}) \\
&= \frac{1}{\sqrt{2\pi}\left|\Sigma_{1|2}\right|} \exp{\bigg(-\frac{1}{2} \bigg(
\begin{bmatrix}
y^{(pr)} \\
z^{(pr)} \\
\end{bmatrix}
 - \mu_{1|2}\bigg)^T\Sigma_{1|2}^{-1} \bigg(
\begin{bmatrix}
y^{(pr)} \\
z^{(pr)} \\
\end{bmatrix}
 - \mu_{1|2} \bigg) 
\bigg)} \\
\end{align*}

$Q_{pr}$ follows a bivariate normal distribution.

*Warning*: math is tricky here. If you read all the way to here, thank you! Please let me know if my derivation is wrong.

# (b)

At the E-step, we calculated

\begin{align*}
w_{(y^{(pr)}, z^{(pr)})}= Q_{pr}(y^{(pr)}, z^{(pr)})
\end{align*}

based on the equation from **(c)**.

Then the lower bound for log likelihood:

\begin{align*}
l(\mu_p, \nu_r, \sigma_p^2, \tau_r^2)  
&= 
\sum_{p=1}^{P} \sum_{r=1}^{R} \sum_{(y,z)}Q_{pr}(y^{(pr)}, z^{(pr)})\log \frac{p(y^{(pr)}, z^{(pr)}, x^{(pr)})}{Q_{pr}(y^{(pr)}, z^{(pr)})}
\\ 
&=
\sum_{p=1}^{P} \sum_{r=1}^{R} \sum_{(y,z)} w_{(y^{(pr)}, z^{(pr)})} \log \frac{p(y^{(pr)}, z^{(pr)}, x^{(pr)})}{w_{(y^{(pr)}, z^{(pr)})}}
\\
&=
\sum_{p=1}^{P} \sum_{r=1}^{R} \sum_{(y,z)} w_{(y^{(pr)}, z^{(pr)})} \log \frac{\frac{1}{(2\pi)^{2/3} \left|\Sigma_{pr}\right|^{1/2}} \exp \bigg(-\frac{1}{2}(a^{(pr)} - m_{pr})^T \Sigma_{pr}^{-1} (a^{(pr)} - m_{pr}) \bigg)}{w_{(y^{(pr)}, z^{(pr)})}} \\
&= \sum_{p=1}^{P} \sum_{r=1}^{R} \sum_{(y,z)} w_{(y^{(pr)}, z^{(pr)})}\bigg (\log \frac{1}{(2\pi)^{2/3} \left|\Sigma_{pr}\right|^{1/2}} - \frac{1}{2}(a^{(pr)} - m_{pr})^T \Sigma_{pr}^{-1} (a^{(pr)} - m_{pr}) - \log w_{(y^{(pr)}, z^{(pr)})} \bigg ) \\
\end{align*}


Note: $(y, z)$ are treated as a single variable to make it easy to track the summation as suggested in the problem description.

For reference, I listed all the involved parts in the above equation,

\begin{align*}
a^{(pr)} &= [y^{(pr)}, z^{(pr)}, x^{(pr)}]^T
\\
m_{pr} &= [\mu_p, \nu_r, \mu_p + \nu_r]^T
\\
\Sigma_{pr} &=
\begin{bmatrix}
\sigma_p^2 & 0        & \sigma_p^2 \\ 
0          & \tau_r^2 & \tau_r^2  \\ 
\sigma_p^2 & \tau_r^2 & \sigma_p^2 + \tau_r^2 + \sigma^2
\end{bmatrix}
\\
\left| \Sigma_{pr} \right| &= \sigma_p^2 \tau_r^2 \sigma^2 
\\
C &=
\begin{bmatrix}
\tau_r^2 (\sigma_p^2 + \sigma^2) & \sigma_p^2 \tau_r^2              & -\sigma_p^2 \tau_r^2 \\ 
\sigma_p^2 \tau_r^2              & \sigma_p^2 (\tau_r^2 + \sigma^2) & -\sigma_p^2 \tau_r^2 \\ 
- \sigma_p^2 \tau_r^2            & - \sigma_p^2 \tau_r^2            & -\sigma_p^2 \tau_r^2 \\
\end{bmatrix}
\\
\Sigma_{pr}^{-1} = \frac{1}{\left|\Sigma_{pr}\right|}C &= 
\begin{bmatrix}
\frac{1}{\sigma^2} + \frac{1}{\sigma_p^2} & \frac{1}{\sigma^2}              & -\frac{1}{\sigma^2} \\ 
\frac{1}{\sigma^2}              & \frac{1}{\sigma^2} + \frac{1}{\tau_r^2}   & -\frac{1}{\sigma^2} \\ 
- \frac{1}{\sigma^2}            & - \frac{1}{\sigma^2}                                & -\frac{1}{\sigma^2} \\
\end{bmatrix}
\end{align*}

where $C$ is the cofactor matrix

##### To update the true value of the $i$th paper, $\mu_i$, maximize the likehoold with respect to $\mu_i$.

Warning: Hairy math. Please let me know if I am wrong, or if there is an easier way to do so.

\begin{align*}
\frac{\partial l}{\partial {\mu_i}} 
&= \frac{\partial}{\partial \mu_i} \sum_{p=1}^{P} \sum_{r=1}^{R} \sum_{(y,z)} w_{(y^{(pr)}, z^{(pr)})}\bigg (\log \frac{1}{(2\pi)^{2/3} \left|\Sigma_{pr}\right|^{1/2}} - \frac{1}{2}(a^{(pr)} - m_{pr})^T \Sigma_{pr}^{-1} (a^{(pr)} - m_{pr}) - \log w_{(y^{(pr)}, z^{(pr)})} \bigg ) \\
&= \sum_{r=1}^{R} \sum_{(y,z)} w_{(y^{(ir)}, z^{(ir)})} (\frac{\partial m_{ir}}{\partial \mu_i})^T \Sigma_{ir}^{-1}(a^{(ir)} - m_{ir})
\\
&= \sum_{r=1}^{R} \sum_{(y,z)} w_{(y^{(ir)}, z^{(ir)})} [1, 0, 1]\Sigma_{ir}^{-1}(a^{(ir)} - m_{ir})
\\
&= \sum_{r=1}^{R} \sum_{(y,z)} w_{(y^{(ir)}, z^{(ir)})} [\frac{1}{\sigma_i^2}, 0, -\frac{2}{\sigma^2}](a^{(ir)} - m_{ir})
\\
&= \sum_{r=1}^{R} \sum_{(y,z)} w_{(y^{(ir)}, z^{(ir)})} [\frac{1}{\sigma_i^2}, 0, -\frac{2}{\sigma^2}]a^{(ir)} - \sum_{r=1}^{R} \sum_{(y,z)} w_{(y^{(ir)}, z^{(ir)})} [\frac{1}{\sigma_i^2}, 0, -\frac{2}{\sigma^2}]m_{ir}
\\
&= \sum_{r=1}^{R} \sum_{(y,z)} w_{(y^{(ir)}, z^{(ir)})} [\frac{1}{\sigma_i^2}, 0, -\frac{2}{\sigma^2}] 
\begin{bmatrix}
y^{(ir)} \\
z^{(ir)} \\
x^{(ir)}
\end{bmatrix}
- \sum_{r=1}^{R} \sum_{(y,z)} w_{(y^{(ir)}, z^{(ir)})} [\frac{1}{\sigma_i^2}, 0, -\frac{2}{\sigma^2}]
\begin{bmatrix}
\mu_i \\
\nu_r \\
\mu_i + \nu_r
\end{bmatrix}
\end{align*}

Setting it to zero, we get



\begin{align*}
\sum_{r=1}^{R} \sum_{(y,z)} w_{(y^{(ir)}, z^{(ir)})} [\frac{1}{\sigma_i^2}, 0, -\frac{2}{\sigma^2}] 
\begin{bmatrix}
y^{(ir)} \\
z^{(ir)} \\
x^{(ir)}
\end{bmatrix}
&= \sum_{r=1}^{R} \sum_{(y,z)} w_{(y^{(ir)}, z^{(ir)})} [\frac{1}{\sigma_i^2}, 0, -\frac{2}{\sigma^2}]
\begin{bmatrix}
\mu_i \\
\nu_r \\
\mu_i + \nu_r
\end{bmatrix}
\\
\sum_{r=1}^{R} \sum_{(y,z)} w_{(y^{(ir)}, z^{(ir)})} \bigg(\frac{y^{(ir)}}{\sigma_i^2} - \frac{2 x^{(ir)}}{\sigma^2}\bigg )
&= \sum_{r=1}^{R} \sum_{(y,z)} w_{(y^{(ir)}, z^{(ir)})} \bigg( \frac{\mu_i}{\sigma_i^2} - \frac{2(\mu_i + \nu_r)}{\sigma^2} \bigg)
\\
\sum_{r=1}^{R} \sum_{(y,z)} w_{(y^{(ir)}, z^{(ir)})} \bigg(\frac{y^{(ir)}}{\sigma_i^2} - \frac{2 (x^{(ir)} - \nu_r)}{\sigma^2}\bigg)
&= \sum_{r=1}^{R} \sum_{(y,z)} w_{(y^{(ir)}, z^{(ir)})} (\frac{1}{\sigma_i^2} - \frac{2}{\sigma^2})\mu_i
\\
\mu_i &= \frac{\sum_{r=1}^{R} \sum_{(y,z)} w_{(y^{(ir)}, z^{(ir)})} \bigg(\frac{y^{(ir)}}{\sigma_i^2} - \frac{2 (x^{(ir)} - \nu_r)}{\sigma^2}\bigg)}{\sum_{r=1}^{R} \sum_{(y,z)} w_{(y^{(ir)}, z^{(ir)})} (\frac{1}{\sigma_i^2} - \frac{2}{\sigma^2})}
\end{align*}

##### To update the variance of $\mu_i$, i.e. $\sigma_i^2$

\begin{align*}
\frac{\partial l}{\partial {\sigma_i^2}} 
&= \frac{\partial}{\partial \sigma_i^2} \sum_{p=1}^{P} \sum_{r=1}^{R} \sum_{(y,z)} w_{(y^{(pr)}, z^{(pr)})}\bigg (\log \frac{1}{(2\pi)^{2/3} \left|\Sigma_{pr}\right|^{1/2}} - \frac{1}{2}(a^{(pr)} - m_{pr})^T \Sigma_{pr}^{-1} (a^{(pr)} - m_{pr}) - \log w_{(y^{(pr)}, z^{(pr)})} \bigg ) \\
&= \frac{\partial}{\partial \sigma_i^2} \sum_{r=1}^{R} \sum_{(y,z)} w_{(y^{(ir)}, z^{(ir)})} \bigg( -\frac{1}{2} \log\left|\Sigma_{ir}\right| - \frac{1}{2}(a^{(ir)} - m_{ir})^T \Sigma_{ir}^{-1} (a^{(ir)} - m_{ir}) \bigg) 
\\
&= - \frac{1}{2} \sum_{r=1}^{R} \sum_{(y,z)} w_{(y^{(ir)}, z^{(ir)})} \bigg( \frac{1}{\left|\Sigma_{ir}\right|}\frac{\partial \left|\Sigma_{ir}\right|}{\partial \sigma_i^2}  + (a^{(ir)} - m_{ir})^T \frac{\partial \Sigma_{ir}^{-1}}{\partial \sigma_i^2} (a^{(ir)} - m_{ir}) \bigg) 
\\
&= - \frac{1}{2} \sum_{r=1}^{R} \sum_{(y,z)} w_{(y^{(ir)}, z^{(ir)})} \bigg( \frac{1}{\sigma_i^2}  + \begin{bmatrix}
y^{(ir)} - \mu_i \\
z^{(ir)} - \nu_r \\
x^{(ir)} - \mu_i - \nu_r \\
\end{bmatrix}^T
\begin{bmatrix}
-\frac{1}{\sigma_i^4} & 0 & 0 \\
0                     & 0 & 0 \\
0                     & 0 & 0 \\
\end{bmatrix}
\begin{bmatrix}
y^{(ir)} - \mu_i \\
z^{(ir)} - \nu_r \\
x^{(ir)} - \mu_i - \nu_r \\
\end{bmatrix} \bigg)
\\
&= - \frac{1}{2} \sum_{r=1}^{R} \sum_{(y,z)} w_{(y^{(ir)}, z^{(ir)})} 
\bigg( \frac{1}{\sigma_i^2} - \frac{1}{\sigma_i^4}(y^{(ir)} - \mu_i)^2 \bigg)
\end{align*}

Setting it to zero, we get



\begin{align*}
\sigma_i^2 = \frac{\sum_{r=1}^{R} \sum_{(y,z)} w_{(y^{(ir)}, z^{(ir)})} (y^{(ir)} - \mu_i)^2}{\sum_{r=1}^{R} \sum_{(y,z)} w_{(y^{(ir)}, z^{(ir)})}}
\end{align*}

##### To update the bias of the $j$th reviewer, $\nu_j$

\begin{align*}
\frac{\partial l}{\partial {\nu_j}} 
&= \frac{\partial}{\partial \nu_j} \sum_{p=1}^{P} \sum_{r=1}^{R} \sum_{(y,z)} w_{(y^{(pr)}, z^{(pr)})}\bigg (\log \frac{1}{(2\pi)^{2/3} \left|\Sigma_{pr}\right|^{1/2}} - \frac{1}{2}(a^{(pr)} - m_{pr})^T \Sigma_{pr}^{-1} (a^{(pr)} - m_{pr}) - \log w_{(y^{(pr)}, z^{(pr)})} \bigg ) \\
&= \sum_{p=1}^{P} \sum_{(y,z)} w_{(y^{(pj)}, z^{(pj)})} (\frac{\partial m_{pj}}{\partial \nu_j})^T \Sigma_{pj}^{-1}(a^{(pj)} - m_{pj})
\\
&= \sum_{p=1}^{P} \sum_{(y,z)} w_{(y^{(pj)}, z^{(pj)})} [0, 1, 1]\Sigma_{pj}^{-1}(a^{(pj)} - m_{pj})
\\
&= \sum_{p=1}^{P} \sum_{(y,z)} w_{(y^{(pj)}, z^{(pj)})} [0, \frac{1}{\tau_j^2}, -\frac{2}{\sigma^2}](a^{(pj)} - m_{pj})
\\
&= \sum_{p=1}^{P} \sum_{(y,z)} w_{(y^{(pj)}, z^{(pj)})} [0, \frac{1}{\tau_j^2}, -\frac{2}{\sigma^2}]a^{(pj)} - \sum_{p=1}^{P} \sum_{(y,z)} w_{(y^{(pj)}, z^{(pj)})} [0, \frac{1}{\tau_j^2}, -\frac{2}{\sigma^2}] m_{pj}
\\
&= \sum_{p=1}^{P} \sum_{(y,z)} w_{(y^{(pj)}, z^{(pj)})} [0, \frac{1}{\tau_j^2}, -\frac{2}{\sigma^2}]
\begin{bmatrix}
y^{(pj)} \\
z^{(pj)} \\
x^{(pj)} \\
\end{bmatrix} - \sum_{p=1}^{P} \sum_{(y,z)} w_{(y^{(pj)}, z^{(pj)})} [0, \frac{1}{\tau_j^2}, -\frac{2}{\sigma^2}] 
\begin{bmatrix}
\mu_p \\
\nu_j \\
\mu_p + \nu_j \\
\end{bmatrix}
\end{align*}


Setting it to zero, we get



\begin{align*}
\sum_{p=1}^{P} \sum_{(y,z)} w_{(y^{(pj)}, z^{(pj)})} [0, \frac{1}{\tau_j^2}, -\frac{2}{\sigma^2}]
\begin{bmatrix}
y^{(pj)} \\
z^{(pj)} \\
x^{(pj)} \\
\end{bmatrix} &=
\sum_{p=1}^{P} \sum_{(y,z)} w_{(y^{(pj)}, z^{(pj)})} [0, \frac{1}{\tau_j^2}, -\frac{2}{\sigma^2}] 
\begin{bmatrix}
\mu_p \\
\nu_j \\
\mu_p + \nu_j \\
\end{bmatrix} 
\\
\sum_{p=1}^{P} \sum_{(y,z)} w_{(y^{(pj)}, z^{(pj)})}\bigg(\frac{z^{(pj)}}{\tau_j^2} - \frac{2x^{(pj)}}{\sigma^2} \bigg) &= 
\sum_{p=1}^{P} \sum_{(y,z)} w_{(y^{(pj)}, z^{(pj)})}\bigg(\frac{\nu_j}{\tau_j^2} - \frac{2(\mu_p + \nu_j)}{\sigma^2} \bigg)
\\
\sum_{p=1}^{P} \sum_{(y,z)} w_{(y^{(pj)}, z^{(pj)})}\bigg(\frac{z^{(pj)}}{\tau_j^2} - \frac{2(x^{(pj)} - \mu_p)}{\sigma^2} \bigg) &= 
\sum_{p=1}^{P} \sum_{(y,z)} w_{(y^{(pj)}, z^{(pj)})}(\frac{1}{\tau_j^2} - \frac{2}{\sigma^2})\nu_j
\\
\nu_j &= \frac{\sum_{p=1}^{P} \sum_{(y,z)} w_{(y^{(pj)}, z^{(pj)})}\bigg(\frac{z^{(pj)}}{\tau_j^2} - \frac{2(x^{(pj)} - \mu_p)}{\sigma^2} \bigg)}{\sum_{p=1}^{P} \sum_{(y,z)} w_{(y^{(pj)}, z^{(pj)})}(\frac{1}{\tau_j^2} - \frac{2}{\sigma^2})}
\end{align*}

##### To update the variance of $\nu_j$, i.e. $\tau_r^2$

\begin{align*}
\frac{\partial l}{\partial {\tau_j^2}} 
&= \frac{\partial}{\partial \tau_j^2} \sum_{p=1}^{P} \sum_{r=1}^{R} \sum_{(y,z)} w_{(y^{(pr)}, z^{(pr)})}\bigg (\log \frac{1}{(2\pi)^{2/3} \left|\Sigma_{pr}\right|^{1/2}} - \frac{1}{2}(a^{(pr)} - m_{pr})^T \Sigma_{pr}^{-1} (a^{(pr)} - m_{pr}) - \log w_{(y^{(pr)}, z^{(pr)})} \bigg ) \\
&= \frac{\partial}{\partial \tau_j^2} \sum_{p=1}^{P} \sum_{(y,z)} w_{(y^{(pj)}, z^{(pj)})} \bigg( -\frac{1}{2} \log\left|\Sigma_{pj}\right| - \frac{1}{2}(a^{(pj)} - m_{pj})^T \Sigma_{pj}^{-1} (a^{(pj)} - m_{pj}) \bigg) 
\\
&= - \frac{1}{2} \sum_{p=1}^{P} \sum_{(y,z)} w_{(y^{(pj)}, z^{(pj)})} \bigg( \frac{1}{\left|\Sigma_{pj}\right|}\frac{\partial \left|\Sigma_{pj}\right|}{\partial \tau_j^2}  + (a^{(pj)} - m_{pj})^T \frac{\partial \Sigma_{pj}^{-1}}{\partial \tau_j^2} (a^{(pj)} - m_{pj}) \bigg) 
\\
&= - \frac{1}{2} \sum_{j=1}^{P} \sum_{(y,z)} w_{(y^{(pj)}, z^{(pj)})} \bigg( \frac{1}{\tau_j^2}  + \begin{bmatrix}
y^{(pj)} - \mu_p \\
z^{(pj)} - \nu_j \\
x^{(pj)} - \mu_p - \nu_j \\
\end{bmatrix}^T
\begin{bmatrix}
0 & 0 & 0 \\
0 & -\frac{1}{\tau_j^4} & 0 \\
0 & 0 & 0 \\
\end{bmatrix}
\begin{bmatrix}
y^{(pj)} - \mu_p \\
z^{(pj)} - \nu_j \\
x^{(pj)} - \mu_p - \nu_j \\
\end{bmatrix} \bigg)
\\
&= - \frac{1}{2} \sum_{p=1}^{P} \sum_{(y,z)} w_{(y^{(pj)}, z^{(pj)})} 
\bigg( \frac{1}{\tau_j^2} - \frac{1}{\tau_j^4}(z^{(pj)} - \nu_j)^2 \bigg)
\end{align*}

Setting it to zero, we get



\begin{align*}
\tau_j^2 = \frac{\sum_{p=1}^{P} \sum_{(y,z)} w_{(y^{(pj)}, z^{(pj)})} (z^{(pj)} - \nu_j)^2}{\sum_{p=1}^{P} \sum_{(y,z)} w_{(y^{(pj)}, z^{(pj)})}}
\end{align*}

### To summarize the updates

for each paper $i$, and each reviewer $j$,

\begin{align*}
\mu_i &= \frac{\sum_{r=1}^{R} \sum_{(y,z)} w_{(y^{(ir)}, z^{(ir)})} \bigg(\frac{y^{(ir)}}{\sigma_i^2} - \frac{2 (x^{(ir)} - \nu_r)}{\sigma^2}\bigg)}{\sum_{r=1}^{R} \sum_{(y,z)} w_{(y^{(ir)}, z^{(ir)})} (\frac{1}{\sigma_i^2} - \frac{2}{\sigma^2})}
\\
\sigma_i^2 &= \frac{\sum_{r=1}^{R} \sum_{(y,z)} w_{(y^{(ir)}, z^{(ir)})} (y^{(ir)} - \mu_i)^2}{\sum_{r=1}^{R} \sum_{(y,z)} w_{(y^{(ir)}, z^{(ir)})}}
\\
\nu_j &= \frac{\sum_{p=1}^{P} \sum_{(y,z)} w_{(y^{(pj)}, z^{(pj)})}\bigg(\frac{z^{(pj)}}{\tau_j^2} - \frac{2(x^{(pj)} - \mu_p)}{\sigma^2} \bigg)}{\sum_{p=1}^{P} \sum_{(y,z)} w_{(y^{(pj)}, z^{(pj)})}(\frac{1}{\tau_j^2} - \frac{2}{\sigma^2})}
\\
\tau_j^2 &= \frac{\sum_{p=1}^{P} \sum_{(y,z)} w_{(y^{(pj)}, z^{(pj)})} (z^{(pj)} - \nu_j)^2}{\sum_{p=1}^{P} \sum_{(y,z)} w_{(y^{(pj)}, z^{(pj)})}}
\end{align*}

The derivations of $\mu_i$ and $\nu_j$, $\sigma_i^2$ and $\tau_j^2$ are analogous, given one derivation, the other one is apparent.

#### Interpretation of the updates:

(TODO) Is $\sum_{(y,z)} w_{(y^{(ir)}, z^{(ir)})} = 1$?

*Remark*: the other way for writing the lower bound of log likelihood is to split probability, but this results into the product of two multivariate normal distributions. The math may look more hairer. In contrast to Mixture of Gaussian application in http://cs229.stanford.edu/notes/cs229-notes8.pdf, the prior here ($p(y^{(pr)}, z^{(pr)})$) follows a bivariate normal distributions instead of simpler categorical distribution (generalization of Bernoulli distribution to multiple classes)

\begin{align*}
l(\mu_p, \nu_r, \sigma_p^2, \tau_r^2)  
&= 
\sum_{p=1}^{P} \sum_{r=1}^{R} \sum_{y,z}Q_{pr}(y^{(pr)}, z^{(pr)})\log \frac{p(y^{(pr)}, z^{(pr)}, x^{(pr)})}{Q_{pr}(y^{(pr)}, z^{(pr)})}
\\ 
&=
\sum_{p=1}^{P} \sum_{r=1}^{R} \sum_{y,z} w_{(y^{(pr)}, z^{(pr)})} \log \frac{p(x^{(pr)}|y^{(pr)}, z^{(pr)}) p(y^{(pr)}, z^{(pr)})}{w_{(y^{(pr)}, z^{(pr)})}}
\\
&=
\sum_{p=1}^{P} \sum_{r=1}^{R} \sum_{y,z} w_{(y^{(pr)}, z^{(pr)})} \bigg (\log p(x^{(pr)}|y^{(pr)}, z^{(pr)}) + \log p(y^{(pr)}, z^{(pr)}) - \log w_{(y^{(pr)}, z^{(pr)})} \bigg)
\end{align*}