# Hidden Markov Models.

## Overview

* Definition of a Hidden Markov Model
* Python implementation
* The three problems:
    * Likelihood
        * The brute force way
        * The Forward Algorithm
    * Decoding
        * Brute force.
        * Virtebi
    * Learning

### Likelihood

Given a hidden Markov Model $\lambda = (A,B)$ and an observation sequence $O$, determine the likelihood $P(O|\lambda)$ 

* We have an HMM and we know the transition probabilities and the observation likelihoods
* What is the probability of seeing a *specific* sequence of observations?

### Decoding

* We have an HMM and we know the transition probabilities and observation likelihoods
* Given an observation sequence, estimate the most likely hidden state sequence


### Learning 
* We have an HMM and we _don't_ know the transition probabilities or observation likelihoods.
* Given an observation sequence, 'learn' these parameters.

# Definitions


(What follows is an implementation of the algorithms given in Appendix A of 'Speech and Language Processing' by Jurafsky and Martin, 2025)

We have:

* A set of $N$ **states**: 

$$ 
Q = q_1 q_2 ... q_N
$$

* A **transition probability matrix**

$$
A = a_{11} a_{12} a_{N1} a_{NN}
$$

such that 

$$
\Sigma_{j=1}^{n} a_{ij} = 1 , \forall i
$$


* a sequence of **observation likelihoods**

$$
B = b_i(o_t)
$$

each expressing the probability of an observation $o_t$ (drawn from vocabulary $V=v_1,v_2, ..., v_v$) being generated from a state $q_i$



* an initial probability distribution over states.
$$
\pi = \pi_1, \pi_2, ..., \pi_N 
$$ 


Ok so imagine we're studying historical weather patterns; we don't have temperature data but we do have access to a food diary indicating how much ice-cream someone ate every day.


Assumptions

* There are only two weather states: cold(`C`) and hot(`H`)

# Python implementation of a Hidden Markov Model

In [1]:
from markov import HiddenMarkov, Markov, hot, cold , Temperature
from fractions import Fraction 
from itertools import islice
from typing import Iterable 

In [2]:
from coefficients import A,B,PI

In [3]:
hmm = HiddenMarkov(Markov(a=A, pi=PI), b=B)
list(islice(hmm, 20))

[3, 3, 2, 2, 3, 1, 2, 2, 2, 1, 3, 1, 2, 2, 3, 3, 3, 3, 3, 1]

So we can't say 'what was the weather', but we can observe the number of ice creams eaten.

## Problem 1: Likelihood.

Given an HMM $\lambda = (A,B)$ and an observation sequence $O$, determine the likelihood of $P(O|\lambda)$




We start with a simpler situation:

Suppose we knew the weather and wanted to predict how much ice cream Jason will eat.

E.g. for a given hidden state sequence (e.g. *hot hot cold*) compute the output likelihood of *3 1 3*


$$
P(O|Q) = \Pi_{i=1}^TP(o_i|q_i)
$$

So in our example 

$$
P(3\text{ }  1 \text{ } 3 | hot \text{ } hot  \text{ } cold) = P(3|hot) \times P(1 |hot) \times P(3 |cold )
$$

Which we know


In [4]:
(B[hot][3]) * (B[hot][1]) * B[cold][3]

Fraction(1, 125)

Now let's sum over all possible weather sequences:


In [13]:
weather_sequences = [
    (s1, s2, s3) for s1 in [hot, cold] for s2 in [hot, cold] for s3 in [hot, cold]
]


In [14]:
for s1, s2, s3 in weather_sequences:
    p_o = (B[s1][3]) * (B[s2][1]) * (B[s3][3])
    print(f"Probability of seeing (3,1,3) after weather ({s1}, {s2}, {s3}) is {p_o}")

Probability of seeing (3,1,3) after weather (hot, hot, hot) is 4/125
Probability of seeing (3,1,3) after weather (hot, hot, cold) is 1/125
Probability of seeing (3,1,3) after weather (hot, cold, hot) is 2/25
Probability of seeing (3,1,3) after weather (hot, cold, cold) is 1/50
Probability of seeing (3,1,3) after weather (cold, hot, hot) is 1/125
Probability of seeing (3,1,3) after weather (cold, hot, cold) is 1/500
Probability of seeing (3,1,3) after weather (cold, cold, hot) is 1/50
Probability of seeing (3,1,3) after weather (cold, cold, cold) is 1/200


In [18]:
sigma_p_o = sum([(B[s1][3]) * (B[s2][1]) * (B[s3][3]) for (s1, s2, s3) in weather_sequences])
print(f"Total probability of observing (3,1,3) is {sigma_p_o}")


Total probability of observing (3,1,3) is 7/40


So the probability of seeing `3 1 3` is $\frac{7}{40} $


### Computational cost

Unfortunately brute-forcing over all possible sequences can become computationally infeasible