# Example: Properties of Markov Models and Stationary Distributions
In this example, you will explore the fundamental properties of Markov models by constructing a discrete-time Markov chain, computing its stationary distribution, and validating the theoretical predictions through simulation.

> __Learning Objectives:__
>
> After completing this activity, students will be able to:
> * **Construct and validate transition matrices:** Build a three-state Markov chain transition matrix and verify that it satisfies the row-stochastic property, where each row sums to one and represents valid transition probabilities.
> * **Compute stationary distributions through iteration:** Implement iterative methods to compute the stationary distribution π by raising the transition matrix to successive powers until convergence and verify the rank-one property of the resulting matrix.
> * **Simulate Markov chain dynamics and validate convergence:** Generate state sequences by sampling from categorical distributions based on transition probabilities and demonstrate that the empirical state frequencies converge to the theoretical stationary distribution as the number of samples increases.

This is an essential foundation for understanding stochastic processes in machine learning and data science. So, let's get started!
___

## Setup, Data, and Prerequisites
First, we set up the computational environment by including the `Include.jl` file and loading any needed resources.

> The [`include(...)` command](https://docs.julialang.org/en/v1/base/base/#include) evaluates the contents of the input source file, `Include.jl`, in the notebook's global scope. The `Include.jl` file sets paths, loads required external packages, etc. For additional information on functions and types used in this material, see the [Julia programming language documentation](https://docs.julialang.org/en/v1/). 

Let's set up our code environment:

In [3]:
include(joinpath(@__DIR__, "Include.jl")); # include the Include.jl file

In addition to standard Julia libraries, we'll also use [the `VLDataScienceMachineLearningPackage.jl` package](https://github.com/varnerlab/VLDataScienceMachineLearningPackage.jl). Check out [the documentation](https://varnerlab.github.io/VLDataScienceMachineLearningPackage.jl/dev/) for more information on the functions, types, and data used in this material.

### Implementation
Before we start this example, let's set up the `iterate(...)` method and specify some constants. We'll use the `iterate(...)` method to compute the stationary distribution $\pi$.
```julia
iterate(P::Array{Float64,2}, counter::Int; 
        maxcount::Int = 100, ϵ::Float64 = 0.1) -> Array{Float64,2}
```
> Iteratively computes a stationary distribution. Computation stops if ||P_new - P|| < ϵ or the maximum number of iterations is reached. 

In [6]:
function iterate(P::Array{Float64,2}; 
        maxcount::Int = 100, ϵ::Float64 = 0.1)::Array{Float64,2}

    # initialize -
    counter = 1; # initialize the iteration counter
    is_ok_to_stop = false; # flag for while loop
    P_new = nothing; # initialize P_new matrix
    N = size(P,1); # number of rows in P matrix
    πₒ = ones(Float64, N) ./ N; # initial uniform distribution
    π = reshape(πₒ, 1, N); # initialize π (make it a 1 x N matrix)
    
    # main loop - iterate until the difference ||P_new - P|| <= ϵ -or- we run out of iterations
    while (is_ok_to_stop == false)

        π′ = π * P; # update π
        if (norm((π′ - π),1) <= ϵ || counter >= maxcount)
            is_ok_to_stop = true; # set the flag to exit the while loop

            # warn the user if we hit the maxcount
            if (counter >= maxcount)
                @warn "Maximum iteration count reached before convergence."
            end
        end
        π = reshape(π′, 1, N); # update π (make sure it's a 1 x N matrix)
        counter += 1; # update the counter
    end

    # return -
    return π;
end;

#### Constants
In the simulations below, we'll need some constant values that we set here. In particular, we set a value for the `number_of_hidden_states` variable, the `number_of_simulation_steps` variable (the number of steps we take in a Markov chain), and the `number_of_samples` variable:

In [8]:
number_of_hidden_states = 3; # how many states do we have?
number_of_simulation_steps = 20000; # number of simulation steps
number_of_samples = 10000; # number of samples

___

<div>
    <center>
        <img src="figs/Fig-ThreeState-MM-Schematic.svg" width="580"/>
    </center>
</div>

## Task 1: Set up the transition matrix for a three-state Markov model
In this task, we'll set up the transition matrix $\mathbf{P}$ for a three-state [Markov chain model](https://en.wikipedia.org/wiki/Markov_chain). In this example, we have three states $\mathcal{S}=\left\{1,2,3\right\}$, where the probability of moving from state $i$ to state $j$ in the next time step, denoted as $p_{ij}$, is an element of the matrix $\mathbf{P} \in \mathbb{R}^{3\times{3}}$.

In [12]:
P = [
    0.05 0.95 0.0 ; # moves for state 1
    0.6 0.2 0.2 ; # moves for state 2
    0.0 0.3 0.7 ; # moves for state 3
];

The rank of the transition matrix $\mathbf{P}$ indicates the number of linearly independent rows (or columns). A full-rank matrix ensures that all states are reachable, which is important for guaranteeing the existence of a unique stationary distribution.

In [61]:
rank(P)

3

### Check: Do the rows of the transition matrix $\mathbf{P}$ sum to `1`?
We know that the rows of the transition matrix $\mathbf{P}$ must sum to `1`. That is, if we are in state $s_{i}\in\mathcal{S}$ at time $t$, then at time $t+1$ we must be in some state $s_{j}\in\mathcal{S}$. 

> __Check:__ Let's verify that the transition matrix $\mathbf{P}$ meets this criterion using the [@assert macro](https://docs.julialang.org/en/v1/base/base/#Base.@assert) by iterating over the rows of the transition matrix $\mathbf{P}$ and checking the sum of each row. If any row does not meet this criterion, an [AssertionError](https://docs.julialang.org/en/v1/base/base/#Core.AssertionError) will be thrown.

So what do we see?

In [59]:
for i ∈ 1:number_of_hidden_states
    @assert sum(P[i,:]) == 1
end

___

## Task 2: Compute the stationary distribution $\pi$
In this task, we'll compute the stationary distribution $\pi$ for our example [Markov chain](https://en.wikipedia.org/wiki/Markov_chain).
For a non-periodic Markov chain with a finite state space $\mathcal{S}$ and an invariant state transition matrix $\mathbf{P}$, the state vector at time $j$, denoted by $\mathbf{\pi}_{j}$, has the property:
$$
\begin{equation*}
\sum_{j\in\mathcal{S}}\pi_{sj} = 1\qquad\forall{s}\in\mathcal{S}
\end{equation*}
$$
where $\pi_{sj}\geq{0},\forall{s}\in\mathcal{S}$. The state of the Markov chain at time step $n$, denoted by $\mathbf{\pi}_{n}$, is given by:
where $\pi_{sj}\geq{0}, \forall{s}\in\mathcal{S}$. The state of the Markov chain at time step $n$, denoted by $\mathbf{\pi}_{n}$, is given by:
$$
\begin{equation*}
\mathbf{\pi}_{n} = \mathbf{\pi}_{0}\cdot\left(\mathbf{P}\right)^n
\end{equation*}
$$
where $\mathbf{\pi}_{n}$ is the state vector at time step $n$ and $\left(\mathbf{P}\right)^{n}$ is the transition matrix raised to the $n$-th power. Finally, a unique stationary distribution $\bar{\pi}$ exists, where $\mathbf{P}^{k}$ converges to a rank-one matrix in which each row is the stationary distribution $\pi$:
$$
\begin{equation*}
\lim_{k\rightarrow\infty} \mathbf{P}^{k} = \mathbf{1}^{\top}\otimes\pi
\end{equation*} 
$$
where $\mathbf{1}$ is a column vector of all ones and the operator $\otimes$ denotes a __Left Matrix Vector Product operation__.

<div>
    <center>
        <img src="figs/Fig-Matrix-Vector-Left-bA-product-NeedToRedrawThis.png" width="580"/>
    </center>
</div>

### Implementation details
We'll compute the stationary distribution $\bar{\pi}$ using the iterative `iterate(...)` method, implementing the pseudocode from the lecture notes.

> __What is going to happen?__ 
>
> We'll iterate until we hit one of two possible conditions:
> * The `counter == maxcount`; at this point, the iteration stops, and the vector $\bar{\pi}$ is returned (but it may not be correct).
> * The iteration also stops when the difference between subsequent estimates of $\bar{\pi}$ is smaller than a specified threshold $\epsilon$. In this case, the correct $\bar{\pi}$ is returned.

We'll save the stationary distribution in the $\bar{\pi}$ variable.

In [67]:
π̄ = iterate(P, ϵ = 0.00000000001, maxcount=10000)

1×3 Matrix{Float64}:
 0.274809  0.435115  0.290076

__So how did this work?__ We compute the stationary distribution $\bar{\pi}$ by directly iterating the expression:
$$
\pi_{n} = \pi_{\circ}\cdot\mathbf{P}^{n}\quad\,n=1,2,\dots
$$
As $n\rightarrow\infty$ (i.e., as we perform more iterations), the difference between subsequent iterations becomes small, $||\pi_{n+1}-\pi_{n}||<\epsilon$, for a non-periodic Markov chain, where $\pi_{n}\rightarrow\bar{\pi}$ as $n\rightarrow\infty$. 

In [71]:
let

    π₁ = [0.0,0.0,1.0]; # initial state = state 2
    direct_state_distribution = Dict{Int,Array{Float64,1}}();
    direct_state_distribution[1] = π₁;
    for n = 2:number_of_simulation_steps
        πᵢ = transpose(π₁)*(P)^(n-1)
        direct_state_distribution[n] =  transpose(πᵢ)
    end
    foreach(i-> println(direct_state_distribution[i]), 1:4:100);
end

[0.0, 0.0, 1.0]
[0.24255, 0.37215, 0.3853]
[0.27130496343749994, 0.4258096678124999, 0.30288536874999994]
[0.2744871227606776, 0.43370691281157997, 0.2918059644277421]
[0.27479269989839583, 0.43489621133423917, 0.29031108876736444]
[0.2748117983316515, 0.43507978814277093, 0.2901084135255771]
[0.27481039464700924, 0.4351088475155027, 0.2900807578374874]
[0.2748094861454378, 0.43511356150168073, 0.2900769523528806]
[0.2748092332242822, 0.4351143437351992, 0.29007642304051756]
[0.27480917541599736, 0.43511447616954035, 0.29007634841446117]
[0.2748091633052802, 0.43511449897708304, 0.2900763377176356]
[0.2748091608854912, 0.43511450296045384, 0.2900763361540536]
[0.27480916041564357, 0.43511450366400156, 0.2900763359203533]
[0.2748091603260757, 0.4351145037893567, 0.2900763358845661]
[0.27480916030920954, 0.4351145038118426, 0.2900763358789462]
[0.27480916030606006, 0.4351145038158966, 0.29007633587804144]
[0.2748091603054754, 0.4351145038166302, 0.29007633587789233]
[0.2748091603053673, 

__Quick and dirty implementation:__ If we want a quick and dirty implementation, we can just raise the transition matrix $\mathbf{P}$ to a large power using the [matrix power operator `^`](https://docs.julialang.org/en/v1/base/math/#Base.:^-Tuple{Number,%20Number}) and then extract any row of the resulting matrix as the stationary distribution $\bar{\pi}$.

In [81]:
test = P^(2000) # raise P to a large power

3×3 Matrix{Float64}:
 0.274809  0.435115  0.290076
 0.274809  0.435115  0.290076
 0.274809  0.435115  0.290076

In [83]:
rank(test)

1

### Check: Is the rank condition on the stationary distribution $\bar{\pi}$ correct?
Once we reach the stationary distribution, the rank of the stationary distribution matrix should be equal to `1`. Let's check whether this condition is true using the [@assert macro](https://docs.julialang.org/en/v1/base/base/#Base.@assert). 
> __Check:__ If we do not meet this criterion, an [AssertionError](https://docs.julialang.org/en/v1/base/base/#Core.AssertionError) will be thrown, and we should try using more iterations or a tighter numerical tolerance value for $\epsilon$. We'll compute the rank using the [rank function](https://docs.julialang.org/en/v1/stdlib/LinearAlgebra/#LinearAlgebra.rank), which is exported by the [Julia LinearAlgebra package](https://docs.julialang.org/en/v1/stdlib/LinearAlgebra/).

What do we see?

In [85]:
@assert rank(test) == 1

___

## Task 3: Compute states using a categorical distribution
In this task, we compute the sequence of states for the three-state [Markov chain model](https://en.wikipedia.org/wiki/Markov_chain). We can obtain the dynamics predicted by the [Markov model](https://en.wikipedia.org/wiki/Markov_model) (i.e., the sequence of states and state transitions) by sampling the transition probability matrix $\mathbf{P}$ directly. 

Now that we are sure that the transition matrix $\mathbf{P}$ is properly formulated, let's populate the `hidden_state_probability_dictionary`, which holds the [categorical distribution](https://en.wikipedia.org/wiki/Categorical_distribution) modeling the transition probability for each hidden state $s\in\mathcal{S}$ (i.e., the probability that we transition from state $i$ to state $j$ in the next time step).

In [87]:
hidden_state_probability_dictionary = Dict{Int,Categorical}();
for i ∈ 1:number_of_hidden_states
    hidden_state_probability_dictionary[i] = Categorical(P[i,:])
end
hidden_state_probability_dictionary

Dict{Int64, Categorical{P} where P<:Real} with 3 entries:
  2 => Categorical{Float64, Vector{Float64}}(support=Base.OneTo(3), p=[0.6, 0.2…
  3 => Categorical{Float64, Vector{Float64}}(support=Base.OneTo(3), p=[0.0, 0.3…
  1 => Categorical{Float64, Vector{Float64}}(support=Base.OneTo(3), p=[0.05, 0.…

Now, we generate `number_of_simulation_steps` worth of dynamic data by sampling the `hidden_state_probability_dictionary`. We store these simulation results in the `hidden_simulation_dict` dictionary, where the `key` holds the time index and the `value` is the system's state (i.e., $s_{i}\in\mathcal{S}$).

We start by specifying an initial state `sᵢ = 1`. At each iteration of the loop, we retrieve the [categorical distribution](https://en.wikipedia.org/wiki/Categorical_distribution) corresponding to the current state (i.e., the row in the transition matrix $\mathbf{P}$ corresponding to state $s_{i}$). We generate the state at the next step by drawing a sample using the `rand(...)` function.

In [97]:
hidden_simulation_dict = let

    # initialize -
    hidden_simulation_dict = Dict{Int,Int}();
    sᵢ = 1; # hardcode the start state: we could draw from the stationary distribution
    hidden_simulation_dict[1] = sᵢ;

    for i ∈ 2:number_of_simulation_steps

        # get the categorical distribution for sᵢ 
        sᵢ = hidden_state_probability_dictionary[sᵢ] |> d -> rand(d);
    
        # capture -
        hidden_simulation_dict[i] = sᵢ
    end
    foreach(i -> println("Soln: (t=$(i),s=$(hidden_simulation_dict[i]))"), 1:1:100) # print first few steps

    hidden_simulation_dict; # return
end;

Soln: (t=1,s=1)
Soln: (t=2,s=2)
Soln: (t=3,s=2)
Soln: (t=4,s=1)
Soln: (t=5,s=2)
Soln: (t=6,s=1)
Soln: (t=7,s=2)
Soln: (t=8,s=1)
Soln: (t=9,s=2)
Soln: (t=10,s=1)
Soln: (t=11,s=2)
Soln: (t=12,s=1)
Soln: (t=13,s=2)
Soln: (t=14,s=1)
Soln: (t=15,s=2)
Soln: (t=16,s=1)
Soln: (t=17,s=1)
Soln: (t=18,s=2)
Soln: (t=19,s=2)
Soln: (t=20,s=1)
Soln: (t=21,s=2)
Soln: (t=22,s=2)
Soln: (t=23,s=1)
Soln: (t=24,s=1)
Soln: (t=25,s=2)
Soln: (t=26,s=1)
Soln: (t=27,s=2)
Soln: (t=28,s=3)
Soln: (t=29,s=3)
Soln: (t=30,s=3)
Soln: (t=31,s=3)
Soln: (t=32,s=2)
Soln: (t=33,s=1)
Soln: (t=34,s=2)
Soln: (t=35,s=1)
Soln: (t=36,s=2)
Soln: (t=37,s=1)
Soln: (t=38,s=2)
Soln: (t=39,s=1)
Soln: (t=40,s=2)
Soln: (t=41,s=1)
Soln: (t=42,s=1)
Soln: (t=43,s=2)
Soln: (t=44,s=2)
Soln: (t=45,s=1)
Soln: (t=46,s=2)
Soln: (t=47,s=1)
Soln: (t=48,s=2)
Soln: (t=49,s=1)
Soln: (t=50,s=2)
Soln: (t=51,s=1)
Soln: (t=52,s=2)
Soln: (t=53,s=1)
Soln: (t=54,s=2)
Soln: (t=55,s=1)
Soln: (t=56,s=2)
Soln: (t=57,s=1)
Soln: (t=58,s=2)
Soln: (t=59,s=1)
Soln: 

### Check: Do we recover the stationary distribution $\bar{\pi}$?
Just like any dynamic system (e.g., concentration balances in a steady-state reactor, constant flow of liquid through a pipe, or the volume of water in a sink with the faucet on and the drain partially closed), if we wait long enough, a Markov model will approach its stationary distribution $\bar{\pi}$.

We've generated `number_of_simulation_steps` worth of data from our Markov model. Let's check if the distribution of states from our simulation matches the stationary distribution $\bar{\pi}$ that we computed in Task 2.

> __What should we expect to see?__ As the number of simulation steps increases, the frequency distribution of states observed in our simulation should approach the stationary distribution $\bar{\pi}$. 

So what do we see?

In [99]:
let

    # initialize -
    df = DataFrame();
   
    # count the number of times we visit each state
    S = [hidden_simulation_dict[i] for i ∈ 1:number_of_simulation_steps];
    NS₁ = findall(x-> x == 1, S) |> length;
    NS₂ = findall(x-> x == 2, S) |> length;
    NS₃ = findall(x-> x == 3, S) |> length;

    # compute the frequency of each state
    PS1 = NS₁/number_of_simulation_steps;
    PS2 = NS₂/number_of_simulation_steps;
    PS3 = NS₃/number_of_simulation_steps;

    # Let's make a table -
    # state 1
    row_df = (
        state = 1,
        count = NS₁,
        total = number_of_simulation_steps,
        frequency = PS1,
        probability = π̄[1,1]
    );
    push!(df, row_df);

    # state 2:
    row_df = (
        state = 2,
        count = NS₂,
        total = number_of_simulation_steps,
        frequency = PS2,
        probability = π̄[1,2]
    );
    push!(df, row_df);

    # state 3:
    row_df = (
        state = 3,
        count = NS₃,
        total = number_of_simulation_steps,
        frequency = PS3,
        probability = π̄[1,3]
    );
    push!(df, row_df);


    pretty_table(
        df;
        backend = :text,
        table_format = TextTableFormat(borders = text_table_borders__compact)
    );

end

 ------- ------- ------- ----------- -------------
 [1m state [0m [1m count [0m [1m total [0m [1m frequency [0m [1m probability [0m
 [90m Int64 [0m [90m Int64 [0m [90m Int64 [0m [90m   Float64 [0m [90m     Float64 [0m
 ------- ------- ------- ----------- -------------
      1    2760   10000       0.276      0.274809
      2    4319   10000      0.4319      0.435115
      3    2921   10000      0.2921      0.290076
 ------- ------- ------- ----------- -------------


### What happens if we sample $\bar{\pi}$ directly?
The simulation above tracks the full Markov chain dynamics by sampling the transition matrix $\mathbf{P}$ step-by-step. But what if we only care about the long-run behavior? We can skip the dynamics entirely and sample $\bar{\pi}$ directly. This is faster but loses all information about state transitions over time.

Let's create a [categorical distribution](https://en.wikipedia.org/wiki/Categorical_distribution) using the stationary probability of our Markov chain with the [Distributions.jl](https://github.com/JuliaStats/Distributions.jl) package and save it in the variable `d`:

In [36]:
d = Categorical(π̄[1,:]);

We generate samples from the [categorical distribution](https://en.wikipedia.org/wiki/Categorical_distribution) saved in the variable `d` using the [rand function](https://docs.julialang.org/en/v1/stdlib/Random/#Base.rand). This allows us to simulate the steady-state behavior of the system encoded by our three-state Markov model. 

For example:

In [38]:
rand(d,100) # generate 100 samples from the stationary distribution

100-element Vector{Int64}:
 3
 1
 1
 2
 3
 3
 2
 2
 3
 1
 3
 3
 2
 ⋮
 2
 2
 1
 3
 1
 3
 2
 3
 3
 2
 2
 2

### Check: Do we recover the stationary distribution $\bar{\pi}$?
If the distribution `d` represents the stationary distribution $\bar{\pi}$, we should be able to generate many samples and estimate the probability that we are in state `s=1`, `s=2`, or `s=3`. Let's sample the distribution `d` to recover the stationary distribution $\pi$. 

We'll draw `number_of_samples` from the distribution `d` and then calculate the frequency of `s = 1`, `s = 2`, and `s = 3` values. These should converge to the stationary probability as `number_of_samples` becomes large:

In [40]:
let

    # initialize -
    df = DataFrame();
    samples = rand(d, number_of_samples);

    # compute the counts -
    N₁ = findall(x-> x == 1, samples) |> length;
    N₂ = findall(x-> x == 2, samples) |> length;
    N₃ = findall(x-> x == 3, samples) |> length;

    # compute the frequencies -
    f₁ = N₁/number_of_samples;
    f₂ = N₂/number_of_samples;
    f₃ = N₃/number_of_samples;

    # Let's make a table -
    # state 1
    row_df = (
        state = 1,
        count = N₁,
        total = number_of_simulation_steps,
        frequency = f₁,
        probability = π̄[1,1]
    );
    push!(df, row_df);

    # state 2:
    row_df = (
        state = 2,
        count = N₂,
        total = number_of_simulation_steps,
        frequency = f₂,
        probability = π̄[1,2]
    );
    push!(df, row_df);

    # state 3:
    row_df = (
        state = 3,
        count = N₃,
        total = number_of_simulation_steps,
        frequency = f₃,
        probability = π̄[1,3]
    );
    push!(df, row_df);


    pretty_table(
        df;
        backend = :text,
        table_format = TextTableFormat(borders = text_table_borders__compact)
    );
end

 ------- ------- ------- ----------- -------------
 [1m state [0m [1m count [0m [1m total [0m [1m frequency [0m [1m probability [0m
 [90m Int64 [0m [90m Int64 [0m [90m Int64 [0m [90m   Float64 [0m [90m     Float64 [0m
 ------- ------- ------- ----------- -------------
      1    2736   10000      0.2736      0.274809
      2    4329   10000      0.4329      0.435115
      3    2935   10000      0.2935      0.290076
 ------- ------- ------- ----------- -------------


___

## Summary
In this example, we explored the fundamental properties of Markov models by constructing a three-state Markov chain, computing its stationary distribution through matrix iteration, and validating theoretical predictions through simulation.

> __Key Takeaways:__
> 
> * **Transition matrix structure and properties:** We constructed a 3×3 transition matrix P satisfying the row-stochastic property and verified that each row sums to one. The full rank of the transition matrix ensures that all states are accessible, which is necessary for the existence of a unique stationary distribution.
> * **Stationary distribution convergence:** We computed the stationary distribution π by iteratively raising the transition matrix to successive powers until convergence (when ||P^(k+1) - P^k|| < ε). The resulting matrix converged to rank one, where every row contains identical probability values representing the stationary distribution, confirming the theoretical prediction for non-periodic Markov chains.
> * **Simulation validates theoretical predictions:** We generated 10,000-step state sequences by sampling categorical distributions based on transition probabilities and demonstrated that empirical state frequencies converged to the theoretical stationary distribution. Direct sampling from the stationary distribution produced identical frequency distributions but eliminated temporal dynamics, illustrating the trade-off between steady-state and transient behavior analysis.

The Markov model framework provides a powerful mathematical tool for modeling stochastic systems, with applications ranging from natural language processing to financial modeling and biological sequence analysis.
___