## Example: Properties of Hidden Markov Models

Hidden Markov models (HMMs) are Markov models with unobservable states $s\in\mathcal{S}$ but observable outcomes $o\in\mathcal{O}$. Each hidden state in an HMM emits an observable single $o_{t}$ at time $t$, with the emission probability:

$$
\begin{equation*}
P(Y = o_{t}\,|\,X = s_{t})
\end{equation*}
$$

where $Y$ is the observable outcome and $X$ is the hidden state. Similar to the transition probability, the emission probability must sum to unity:

$$
\begin{equation*}
\sum_{o\in\mathcal{O}} P(Y = o\,|\,X = s) = 1\qquad\forall{s\in\mathcal{S}}
\end{equation*}
$$

The emission probability plays a crucial role in HMMs, as it is used to calculate the likelihood of a sequence of observed symbols, 
given the current state of the hidden Markov chain. To explore these ideas, let's construct a two-state, three outpout Hidden Markov Models (HMMs). 

<div>
    <center>
        <img src="figs/Fig-HMM-Schematic-23.svg" width="380"/>
    </center>
</div>

#### Learning objectives
The objective of this example is to familiarize students with Hidden Markov models (HMMs) and the components of Hidden Markov models, in particular the transition matrix $\mathbf{P}$, computing the stationary distribution $\pi$, the emission probability matrix $\mathbf{E}$, and finally sampling the stationary distribution using categorical distribution.

## Setup
Let's load some packages that are required for the example by calling the `include(...)` function on our initialization file `Include.jl`:

In [1]:
include("Include.jl");

[32m[1m    Updating[22m[39m git-repo `https://github.com/varnerlab/VLDecisionsPackage.jl.git`
[32m[1m   Resolving[22m[39m package versions...
[32m[1m  No Changes[22m[39m to `~/Desktop/julia_work/CHEME-5760-Examples-F23/Project.toml`
[32m[1m  No Changes[22m[39m to `~/Desktop/julia_work/CHEME-5760-Examples-F23/Manifest.toml`
[32m[1m  Activating[22m[39m project at `~/Desktop/julia_work/CHEME-5760-Examples-F23`
[32m[1m    Updating[22m[39m registry at `~/.julia/registries/General.toml`
[32m[1m    Updating[22m[39m git-repo `https://github.com/varnerlab/VLDecisionsPackage.jl.git`
[32m[1m  No Changes[22m[39m to `~/Desktop/julia_work/CHEME-5760-Examples-F23/Project.toml`
[32m[1m  No Changes[22m[39m to `~/Desktop/julia_work/CHEME-5760-Examples-F23/Manifest.toml`


```julia
iterate(P::Array{Float64,2}, counter::Int; 
        maxcount::Int = 100, ϵ::Float64 = 0.1) -> Array{Float64,2}
```
> Recursively computes a stationary distribution. Computation stops if ||P_new - P|| < ϵ or the max number of iterations is hit. 

In [2]:
function iterate(P::Array{Float64,2}, counter::Int; maxcount::Int = 100, ϵ::Float64 = 0.1)::Array{Float64,2}

    # base case -
    if (counter == maxcount)
        return P
    else
        # generate a new P -
        P_new = P^(counter+1)
        err = P_new - P;
        if (norm(err)<=ϵ)
            return P_new
        else
            # we have NOT hit the error target, or the max iterations
            iterate(P_new, (counter+1), maxcount=maxcount, ϵ = ϵ)
        end
    end
end;

#### Constants 

In [3]:
number_of_hidden_states = 2;
number_of_observable_states = 3;
number_of_samples = 10000;
number_of_simulation_steps = 480;

### 1. Setup the Transition matrix $\mathbf{P}$, and the stationary distribution $\pi$

In [4]:
P = [
    0.9 0.1;
    0.6 0.4;
];

We'll compute the stationary distribution $\pi$ using the recursive `iterate(...)` method. During each call to the `iterate(...)` method, we compute the matrix power of transition matrix $\mathbf{P}$. We continue to call the `iterate(...)` method until we hit one of two possible conditions:

* The `base case` for the recursion occurs when the `counter == maxcount`, at this point the recursion stops, and the matrix $\mathbf{P}$ is returned
* The recursion also stops when the difference between subsequent powers of the matrix $\mathbf{P}$ is smaller than a specified threshold

In [5]:
π̄ = iterate(P,1,ϵ = 0.000001)

2×2 Matrix{Float64}:
 0.857143  0.142857
 0.857143  0.142857

Let's create a [categorical distribution](https://en.wikipedia.org/wiki/Categorical_distribution) using the stationary probability of our Markov chain using the [Distributions.jl](https://github.com/JuliaStats/Distributions.jl) package, save this distribution in the variable `d`:

In [6]:
d = Categorical(π̄[1,:]);

### 2. Setup the Emission probability

In [35]:
EPM = [
    0.8 0.1 0.1 ;
    0.1 0.1 0.8 ;
];

Populate the `emission_probability_dict`, which holds the [categorical distribution](https://en.wikipedia.org/wiki/Categorical_distribution) modeling the emission probability for each hidden state $s\in\mathcal{S}$:

In [36]:
emission_probability_dict = Dict{Int,Categorical}()
for i ∈ 1:number_of_hidden_states
    emission_probability_dict[i] = Categorical(EPM[i,:])
end

### 3. Simulate the output from the HMM

In [33]:
simulation_dict = Dict{Int,Int}()
for i ∈ 1:number_of_simulation_steps
    
    # which state is the mc in?
    hidden_state = rand(d);
    
    # grab the emission probability model from the emission_probability_dict -
    epd = emission_probability_dict[hidden_state];
    
    # role for a random ouput -
    simulation_dict[i] = rand(epd);
end

In [37]:
simulation_dict

Dict{Int64, Int64} with 480 entries:
  56  => 3
  35  => 1
  425 => 1
  429 => 1
  60  => 2
  220 => 1
  308 => 1
  67  => 1
  215 => 1
  73  => 1
  319 => 3
  251 => 1
  115 => 1
  112 => 1
  185 => 3
  348 => 1
  420 => 1
  404 => 1
  365 => 1
  417 => 1
  333 => 1
  86  => 1
  168 => 1
  364 => 1
  207 => 2
  ⋮   => ⋮

In [40]:
test_value = 2;
N₊ = 0;
for (key,value) ∈ simulation_dict
    if (value == test_value)
        N₊ += 1
    end
end
probability = N₊/number_of_simulation_steps;
println("We observe $(test_value) with probability = $(probability)")

We observe 2 with probability = 0.10625
