In [2]:
import numpy as np

Let's start by loading the generated data from the other notebook. We will also transform $x$ to homogeneous coordinates so that the matrix $A$ will incorporate the bias coefficients.

In [10]:
x = np.load('./simulation/x.npy')
z_true = np.load('./simulation/z.npy')

print('K =',np.unique(z_true).size)

m = x.shape[0]
N = x.shape[1]

# add dummy variable
x = np.concatenate((np.ones(x.shape[1]).reshape(1,-1),x))

print(x[:,:5])

K = 3
[[ 1.          1.          1.          1.          1.        ]
 [ 0.          1.34968621  2.08295411 -0.70582717 -1.70715511]
 [ 0.         -0.55905218  1.30886104  1.44194285  0.07453323]]


### Posterior distribution

We want to sample the posterior distribution of the parameters and the latent variable $P(A_k,Q_k,\pi_k,z_t)$. We will sample this with a Gibbs sampler, so we need the conditional version of this posterior distribution given the other parameters (and the observed data as always).

#### $P(A_k,Q_k|all)$

Once a sequence $z_{1:T}$ is given, the conditioned likelihoods of the grouped data (based on the latent state) are gaussian with dynamical parameters $A_k \in \mathbb{R}^{m\times m+1},Q_k \in \mathbb{R}^{m \times m}$ ($A$ has an added row to account for the bias). That is why, for each $k$, we place a _matrix-normal inverse-Wishart_ prior distribution: 

$$
A_k,Q_k \sim MNIW(M_k,\Lambda_k;\Psi_k,\nu_k)
$$

which is the combination of a matrix normal distribution on $A_k$ and an inverse-Wishart distribution on the $Q_k$. Those are the conjugate priors for a multivariate normal likelihood with unknown mean and variance:

$$
MNIW(A;M,\Lambda;\Psi,\nu) = \frac{\exp{(-\frac{1}{2}}Tr[(A-M)^T Q^{-1}(A-M)\Lambda^{-1}])}{|Q|^{-m/2}|\Lambda|^{-n/2}} \times \frac{|\Psi|^{\nu/2}}{|Q|^{(m+\nu+1)/2}} \exp{(-\frac{1}{2} Tr [\Psi Q^{-1}])}
$$

Because of this prior, when we group data by the assigned dyamic $k$, we can do inference on the posterior of these $k$ regressions which is again a _matrix-normal inverse-Wishart_ distribution, with updated parameters:

$$
A_k,Q_k | all \sim MNIW(M_k',\Lambda_k',\Psi_k',\nu_k') \\ 
M_k' = \hat{A}_k X^{(k)}X^{(k)T} \Lambda_k' + M_k \Lambda_k^{-1} \Lambda_k' \\
\Lambda_k' = (\Lambda_k^{-1}+ X^{(k)}X^{(k)T})^{-1}\\
\Psi_k' = \Psi_k + M_k\Lambda_k^{-1}M_k^T + Y^{(k)}Y^{(k)T}-A_k'(???)\\
\nu_k'
$$

In [17]:
from scipy.stats import matrix_normal, invwishart

# since P(A,Q) = P(A|Q)P(Q)
# we sample Q first and then sample A given Q


# ============= INV WISHART

# degrees of freedom
nu = m
# scale matrix, must be symmetric and positive defined
psi = np.ones(m)

Q = invwishart.rvs(nu,psi)

# ============ MATRIX NORMAL
# mean of the distribution
M = np.zeros((m,m+1))
# Q will be the covariance among the rows  (i-th row of A are the coefficient for the regression of the i-th variable)
# and lambda will be the covariance among columns (i-th column of A are the coefficients for the i-th variable in the m regression)
Lambda = np.eye(m+1)
A = matrix_normal.rvs(M,Q,Lambda)

array([[ 7.18134523,  1.5032771 ,  1.58492524],
       [-0.31379839,  0.65111482,  0.3735719 ]])

#### $P(\pi_k|all)$

Again, given $z_{t}$, we can count the transitions to state $k$ to state $l$, and write them in a count vector $n_k$, which provides $P(\pi_k|z_t)$ (the transitions are Markovian, so no other dependence is needed). Since we put a Dirichlet prior on each row of the transition matrix $\pi_k$ with parameter $\alpha_k$, the posterior is again a Dirichlet distribution:

$$
\pi_k | all \sim Dir(\alpha_k + n_k)
$$



### Gibbs sampler

The Gibbs sampler goes as follows:

##### Initialize

Sample $z_{1:T}$ uniformly, and compute:

1. $n_k$ (a vector that counts the transitions from $k$ to any other state)
2. $X^{(k)},Y^{(k)}$ (the subsets of data that obey the $k$-th dynamic under the sequence $z_t$)
3. $\{ M_k ',\Lambda_k ',\Psi_k ',\nu_k ' \}$ the updated parameters of the posterior distribution for $A_k,Q_k$ given observed data and a given sequence $z_{1:T}$

then draw $\Pi, A_k, Q_k$ from the priors.

##### Iterations

1. $\forall k$ sample $\pi_k \sim Dir(\alpha_k + n_k)$
2. $\forall k$ sample $A_k,Q_k \sim MNIW(M_k',\Lambda_k',\Psi_k ',\nu')$
3. $\forall t = 2,\dots,T$ sample a new Markov trajectory from $P(z_t = k) = r_{tk}/\sum_k r_{tk}$
4. Recompute $n_k,X^{(k)},Y^{(k)},M_k',\Lambda_k',\Psi_k',\nu_k'$