In [4]:
import numpy as np
import matplotlib.pyplot as plt

To sample the posterior distribution, we alternate between sampling the discrete variable $z_t$ and sampling the conditional for the parameters $A_k,Q_k$.

Let's say we are given $\Pi$ and $z_t$. Then, we have a conditional posterior distribution on the $A_k$ and $Q_k$, because we can group the data based on the state at time $t$, and have $K$ separate Bayesian multivariate linear regression.

# Bayesian MultiVariate Linear regression

Start from the general case: we have

$$
Y = AX + E
$$

where $Y$ is a $m \times n$ matrix of $n$ observation of $m$ dependent variables, $X$ is $k \times n$ ($k-1$ explanatory variables, we add the dummy variable), and $E$ is $m \times n$ where each column is drawn from a multivariate gaussian: $E_n \sim \mathcal{N}(0,Q)$. The likelihood is gaussian in $Y-AX$:

$$
P(Y|X,A,Q) \sim \mathcal{N}(Y-AX,Q)
$$

Now we want a conjugate prior for both $A$ and $Q$, so that the posterior is functionally identical. It can be done by imposing an inverse Wishart prior on the covariance matrix $Q$ and a matrix normal distribution on $A$, which is some sort of normal distribution but on the vectorized matrix (a "flattened" vector version of the matrix $Q$):

$$
P(A,Q) = P(A|Q)P(Q) = \mathcal{W}^{-1}(\Psi,\nu) \times \mathcal{MN}(M,\Lambda,Q) \\
\mathcal{W}^{-1}(\Psi,\nu) = |\Psi|^{\nu/2}|Q|^{(-m+\nu+1)/2}\exp{(-\frac{1}{2}Tr [ \Psi Q^{-1} ] )} \\
\mathcal{MN}(M,\Lambda,Q) = |Q|^{-m/2}|\Lambda|^{-k/2} \exp{-\frac{1}{2}Tr[(A-M)^T Q^{-1}(A-M)\Lambda^{-1}]}
$$

It can be shown that for this likelihood, the posterior distribution is again a matrix normal times a wishart with updated parameters:

$$
M' = \hat{A} XX^T \Lambda' + M \Lambda^{-1} \Lambda' \\
\Lambda' = (\Lambda^{-1}+ XX^T)^{-1}\\
\Psi' = \Psi + M\Lambda^{-1}M^T + YY^T-A'(???)\\
\nu'= \nu + T
$$

where $\hat{A}$ is the OLS solution:

$$
\hat{A} = YX^T(XX^T)^{-1}
$$

## SLDS

In the case of SLDS, we have $K$ separate linear regressions, and if we have $k$ dynamical variables, then $Y=X_{t}$ is $m \times n$, $X=X_{t-1}$ is $(m+1) \times n$ to account for the bias, and the coefficients matrix is then $m \times (m+1)$

The SLDS is as such:

$$
X_t = A_{z_t} X_{t-1} + \epsilon_{z_t}
$$

So, if we define all the steps where the $k$-th dynamic was used:

$$
t_k = \{ t > 1 | z_t = k \}, \;\;\; |t_k| = T_k
$$

we can group data as 

$$
X^{(k)} = \{ x_{t-1} | t \in t_k \} \\
Y^{(k)} = \{ x_t | t \in t_k \}
$$

and use those to update the parameters $\{M'_k,\Lambda_k',\Psi_k',\nu_k' \}$ for each of the posterior distributions for $A_k,Q_k$.

In [1]:
from scipy.stats import matrix_normal
# the matrix_normal.rvs() accepts:
# mean : what we called M
# rowcov : the (m x m) row covariance matrix, what we called Q
# colcov : the (m+1) x (m+1) columns covariance matrix, what we called lambda

from scipy.stats import invwishart
# the invwishart.rvs() accepts
# df: what we called nu
# scale: what we called psi, same shape as Q


## Hidden state

What about the conditional on $z_t$ and $\Pi$? Given $z_{t-1}$, the next state is the outcome of the categorical distribution $\Pi_{z_{t-1}}$ (the $z_{t-1}$ row of the transition matrix). 

### $\Pi$
By assigning a Dirichlet prior on each row of the transition matrix:
$$
\Pi_{k} \sim Dir(\alpha_k)
$$ 

given $z_t$ (so given a sequence of observed transitions) the posterior distributions are again Dirichlet like (since it's the conjugate prior for a categorical distribution) with updated parameters:

$$
\Pi_{k} | all \sim Dir(\alpha_k + n_k)
$$

where $n_k$ is a vector containing the number of transition observed from state $k$ to any other state.

### $z_t$

Given $z_{t-1}$, the process is similar to a mixture distributions model with latent variable; when calculating the probability of using a certain $z_t$, we have:

$$
P(z_t,\Pi_{z_{t-1}},all) = P(z_t,\Pi_{z_{t-1}}|all)P(all)
$$

now, let $z_{t_k}$ be the probability that $z_t = k$, and $z_{t_{-k}}$ the vector of probabilities for all other realizations. We can write

$$
P(z_t,\Pi_{z_{t-1}}|all)P(all) = P(z_{t_k}|z_{t_{-k}},\Pi_{z_t},all)P(z_{t_{-k}},\Pi_{z_t}|all)P(all)
$$
so that
$$
P(z_{t_k}|z_{t_{-k}},\Pi_{z_t},all) = \frac{P(z_t,\Pi_{z_{t-1}}|all)}{P(z_{t_{-k}},\Pi_{z_t}|all)}
$$

and that leads to

$$
P(z_t = k) = \frac{r_{tk}}{\sum_k r_{tk}}
$$

where

$$
r_{tk} = \Pi_{z_{t-1},k} |Q_k|^{-1/2} \exp \{ -\frac{1}{2} (x_t - A_k x_{t-1})^T Q_k^{-1} (x_t - A_k x_{t-1}) \}
$$

In [2]:
from scipy.stats import dirichlet
# dirichlet.rvs() accepts
# alpha: vector of the same dimensionality of the distribution

# Sampling

To generate samples, we start by initializing the sequence $z_t$ randomly, and $\forall k$ sample $\Pi_k, A_k, Q_k$ from the priors. Then loop:

1. $\forall k$, compute $\{ n_k, X^{(k)}, Y^{(k)}, M'_k, \Lambda'_k,\Psi'_k,\nu'_k  \}$
2. $\forall k$, sample $\Pi_k \sim Dir(\alpha_k + n_k)$
3. $\forall k$, sample $A_k,Q_k \sim MNIW(M'_k, \Lambda'_k,\Psi'_k,\nu'_k)$
4. $\forall t=2,\dots,T$ sample $z_t$ from $P(z_t = k) = r_{tk}/\sum_k r_{tk}$

The sampler alternates sampling the latent variable and sampling the posterior distribution of the parameters.

In [143]:
# load data

X_t = np.load('./simulation/x.npy')
Z_true = np.load('./simulation/z.npy')

m = X_t.shape[0]
T = X_t.shape[1]

# assert the number of states; let's choose them to be
# the true number of latent states that we know

K = len(np.unique(Z_true))

In [144]:
# number of samples
N_samples = 2000

# initialize samples arrays
A = np.empty( (N_samples,K,m,m+1) )
Q = np.empty( (N_samples,K,m,m)   )

Z = np.empty( (N_samples,T) )
Pi = np.empty( (N_samples,K,K) )

# prior parameters

M = np.zeros( (K,m,m+1) )

# identity matrices
Lambda = np.tile(np.eye(m+1),(K,1,1))
Psi = np.tile(np.eye(m),(K,1,1))

nu = np.ones( (K) ) * (m+1)
# uniform distribution for each k
alpha = np.ones((K,K))


# posterior parameters
M_new = np.empty_like(M)
Lambda_new = np.empty_like(Lambda)
Psi_new = np.empty_like(Psi)
nu_new = np.empty_like(nu)
alpha_new = np.empty_like(alpha)

n = np.empty((K,K),dtype=np.int8)



In [145]:
# initialize first sample

Z[0] = np.random.randint(0,K,T)

for k in range(K):

    Q[0][k] = invwishart.rvs(df=nu[k],scale=Psi[k])
    A[0][k] = matrix_normal.rvs(mean=M[k],rowcov=Q[0][k], colcov=Lambda[k])
    Pi[0][k] = dirichlet.rvs(alpha=alpha[k])

In [158]:
# sampling

for i in range(1,2):

    for k in range(K):
        
        # find the indices of z_t = k
        t_k = np.argwhere(Z[i-1] == k).reshape(-1)
        # indices immediately after, removing out of boundaries
        t_k_next = t_k + 1
        t_k_next = t_k_next[t_k_next < T]
        
        # get X, but exclude first point
        # also add dummy variable
        X = X_t[:,t_k[t_k>0]-1]
        X = np.concatenate((X,np.ones(X.shape[1]).reshape(1,-1)),axis=0)
        
        # get Y
        Y = X_t[:,t_k[t_k>0]]
        
        # calculate n_k
        for l in range(K):
            n[k,l] = (Z[i-1][t_k_next] == l).sum()
        
        # compute new posterior parameters

        # M_new[k] = M[k]

