# Foundations of Markov Models and Markov Chains
In this module, we explore Markov models, one of the most useful tools for modeling systems with randomness. These models appear everywhere—from operations and finance to biology and machine learning.

> __Learning Objectives:__
> 
> By the end of this module, you will be able to define and demonstrate mastery of the following key concepts:
>
> * __The Markov Property:__ Understand how future behavior depends only on the present state, not on the past. This memoryless property simplifies modeling by focusing on what is now, not how you arrived here.
> * __The Transition Matrix:__ Learn how to encode state transitions as a matrix where each entry represents the probability of moving from one state to another. Understand the properties that make this matrix valid for any Markov chain.
> * __Stationary Distribution:__ Discover the long-run equilibrium behavior of a Markov chain. This distribution shows where the system spends time after many steps and remains unchanged once reached.

Markov models capture real-world dynamics well when the present state contains all relevant information. Let's explore how these elegant tools work.

___

## Discrete Markov Model States
A discrete Markov model is a stochastic (random) process that describes the transitions between states based on defined probabilities. Formally, a Markov model is determined by the tuple $\mathcal{M} = (\mathcal{S},\mathbf{P})$, where $\mathcal{S}$ represents the set of states in the model and $\mathbf{P}$ is the transition matrix.

### The state space $\mathcal{S}$
The state space $\mathcal{S}$ is the set of all possible values a system can assume. For example, if a Markov chain can be in the states $\left\{1,2,3\right\}$ then $\mathcal{S} \equiv \left\{1,2,3\right\}$. What do these states represent? The answer depends on the problem. Here are two concrete examples:
* __Letters in words__ $\mathcal{S} \equiv \left\{a,b,c,\dotsc,z\right\}$: If the state space $\mathcal{S}$ were defined as the alphabet, we could develop a Markov model to generate $n$ words, each of length $l$ characters, that start with the letter $t$, etc.
* __Investor moods__ $\mathcal{S} \equiv \left\{\text{bullish},\text{neutral},\text{bearish}\right\}$: In this case, the state space is defined as three possible moods the investor is in. Using this type of $\mathcal{S}$, we could build a Markov model to simulate how an investor's mood changes as they watch the market, the news, etc.

Next, let's look at the Markov property and transition matrix $\mathbf{P}$.
___

## The Markov Property
The Markov property is the key characteristic of Markov models: a random process has this property when the probability of future states depends only on the present state, not on how it got there. This memoryless property simplifies how we model stochastic systems. From weather prediction to inventory management, the Markov property captures the behavior of many real-world systems.

> __Definition:__ A stochastic process $\{X_n\}_{n=0}^{\infty}$ satisfies the **Markov property** if, for all $n \geq 0$ and all sequences of states $s_0, s_1, \ldots, s_n, s_{n+1}$ in the state space $\mathcal{S}$:
>
> $$P(X_{n+1} = s_{n+1} \mid X_0 = s_0, X_1 = s_1, \ldots, X_n = s_n) = P(X_{n+1} = s_{n+1} \mid X_n = s_n)$$
>
> In other words, given the present state $X_n = s_n$, the future state $X_{n+1} = s_{n+1}$ is conditionally independent of all past states $X_0 = s_0, X_1 = s_1, \ldots, X_{n-1} = s_{n-1}$. This conditional independence captures the essence of memorylessness: the past is irrelevant once we know the present.

### One-Step Transition Probabilities

The one-step transition probability describes how the system moves from one state to another in a single time step. For a Markov chain with finite state space $\mathcal{S} = \{s_1, s_2, \ldots, s_N\}$, we define:

$$
\begin{equation*}
p_{ij} = P(X_{n+1} = s_j \mid X_n = s_i)
\end{equation*}
$$

Here, $p_{ij}$ is the probability of moving from state $s_i$ at time $n$ to state $s_j$ at time $n+1$. Because of the Markov property, this probability does not depend on $n$ (as long as the process is time-homogeneous, meaning transition probabilities stay constant over time). We pack these probabilities into the transition matrix $\mathbf{P}$.

### Why the Markov Property Matters
The Markov property simplifies our modeling work: instead of tracking the entire history of a process, we only need to know the present state. Without this property, the number of possible historical sequences would grow exponentially, making analysis difficult. With it, the problem becomes tractable and easier to work with.

> __Why it works:__ The Markov property lets us build probability models of complex systems, use dynamic programming algorithms to make optimal decisions (such as value iteration for MDPs), and analyze long-run behavior using matrix operations and eigenvalue analysis. This approach has proven useful in finance, operations research, biology, physics, and machine learning.

The Markov property is an assumption about the real world, and not all stochastic processes satisfy it exactly. However, by choosing the state representation carefully to capture the relevant information, we can often make the Markov assumption work well in practice. The key is whether the essential dynamics can be captured by the current state alone. When this is true, Markov models are straightforward to analyze and use.

___

## The transition matrix
A discrete Markov chain is a sequence of random variables (states) $X_{1},\dotsc, X_{n}$ with the _Markov property_, i.e., the probability of moving to the next state depends only on the present and not past states:
$$
\begin{equation*}
P(X_{n+1} = s_{n+1} \mid X_{1}=s_{1}, \dots, X_{n}=s_{n}) = P(X_{n+1} = s_{n+1} \mid X_{n} = s_{n})
\end{equation*}
$$
For finite state spaces $\mathcal{S}$, the probability of moving from the state(s) $s_{i}\rightarrow{s_{j}}$ in the next step $n\rightarrow{n+1}$, is encoded in the transition matrix $p_{ij}\in\mathbf{P}\in\mathbb{R}^{n\times{n}}$: 
$$
\begin{equation*}
p_{ij} = P(X_{n+1}~=~s_{j}~\mid~X_{n}~=~s_{i})
\end{equation*}
$$

The transition matrix $\mathbf{P}$ represents the state transitions of a Markov chain:

<div>
    <center>
        <img src="figs/Fig-TransitionMatrix-Schematic.svg" width="680"/>
    </center>
</div>

The transition matrix $\mathbf{P}\in\mathbb{R}^{n\times{n}}$ has the following properties:
* All the elements of transition matrix $\mathbf{P}$ are non-negative $p_{ij}\geq{0}$.  
* The rows of $\mathbf{P}$ represent the current states, while the columns represent the future states (our convention). Thus, the element $p_{ij}$ in row $i$ and column $j$ represents the probability of transitioning from state $s_{i}$ to state $s_{j}$ in the next step.
* The rows of $\mathbf{P}$ must sum to unity, i.e., each row encodes the probability of all possible future outcomes, given where we are currently. Thus, for any row $i$, we have: $\sum_{j=1}^{n} p_{ij} = 1$. This means that the transition probabilities from state $s_{i}$ to all other states sum to 1 (we have a fixed number of states, and we must transition to one of them).
* If the transition matrix $\mathbf{P}$ is invariant, then $p_{ij}$ doesn't change as $n\rightarrow{n+1}~\forall{n}$. In other words, the probability of transitioning from state $i$ to state $j$ does not change as the system evolves. The $p_{ij}$ values are constant.

___

## Algorithm: Power Iteration Method
Estimate $\bar{\pi}$ by repeatedly applying the transition matrix to an initial distribution until convergence:

$$
\begin{align*}
\pi_1 &= \pi_0 \mathbf{P} \\
\pi_2 &= \pi_1 \mathbf{P} = \pi_0 \mathbf{P}^2 \\
\pi_3 &= \pi_2 \mathbf{P} = \pi_0 \mathbf{P}^3 \\
&\vdots \\
\pi_k &= \pi_{k-1} \mathbf{P} = \pi_0 \mathbf{P}^k
\end{align*}
$$

__Initialize__: Given an ergodic transition matrix $\mathbf{P}$ with $N$ states, tolerance parameter $\epsilon$, and maximum number of iterations $T$. Initialize the iteration counter $t\gets 1$, the uniform initial distribution $\pi_0 \gets \left[1/N,\ldots,1/N\right]$, and $\texttt{converged}\gets\texttt{false}$.

While $\texttt{converged}$ is $\texttt{false}$ __do__:
1. Compute: $\pi_t \gets \pi_{t-1}\mathbf{P}$.
2. Check for convergence:
    - If $\lVert \pi_t - \pi_{t-1} \rVert_{1} \leq \epsilon$, then set $\texttt{converged}\gets\texttt{true}$ and $\bar{\pi}\gets\pi_t$.
    - If $\lVert \pi_t - \pi_{t-1} \rVert_{1} > \epsilon$, increment $t\gets{t+1}$.
3. Update the $\texttt{converged}$ flag:
    - If $t\geq{T}$, then set $\texttt{converged}\gets\texttt{true}$ and $\bar{\pi}\gets\pi_t$.

__Output__: The stationary distribution $\bar{\pi}$.

The convergence rate depends on the spectral gap (difference between the largest and second-largest eigenvalue magnitudes of $\mathbf{P}$), with larger gaps enabling faster convergence.

## Example
Let's explore a simple Markov chain, compute it's stationary distribution and play with the model parameters to see how they affect the long-term behavior.

> [▶ Let's explore properties of Markov models and stationary distributions](CHEME-5800-L11a-Example-MarkovModels-Fall-2025.ipynb). In this example, you will explore the fundamental properties of Markov models by constructing a discrete-time Markov chain, computing its stationary distribution, and validating the theoretical predictions through simulation.

___

## Lab
In lab `L11b`, we build a Markov model for word generation based on letter transition probabilities derived from a text corpus.


## Summary

In this module, we explored how Markov models capture the dynamics of systems where the future depends only on the present state:

> __Key takeaways:__
>
> * **Markov property as simplification**: The memoryless property of Markov chains allows us to model complex systems by tracking only the current state, not the entire history. This dramatically reduces computational complexity and makes analysis tractable while still capturing essential system behavior.
> * **Transition matrices encode state dynamics**: The transition matrix encodes all rules governing how a system moves between states. Each row sums to one, rows represent current states, and columns represent future states. Time-homogeneity means these rules remain constant over time, enabling long-term predictions.
> * **Stationary distributions reveal long-run behavior**: For ergodic chains, repeated application of the transition matrix converges to a unique stationary distribution. This equilibrium distribution shows where the system spends time indefinitely and can be found through power iteration or eigenvalue methods.

These foundational concepts enable practical applications across operations, finance, and machine learning where systems evolve based on discrete states and probabilistic transitions.

___