# Problem 2: Decoding

* We have an HMM and we know the transition probabilities and observation likelihoods
* Given an observation sequence, estimate the most likely hidden state sequence

For any model that contains hidden variables, the task of determining which sequence of variable is the underlying source of some sequence of observations is called the **decoding** task.


So in our example:


Given a sequence of ice-cream observations *3 1 3* an an HMM, find the best hidden weather sequence (temp, temp, temp) 



This is easy with brute force: just check each possible sequence.

# Brute Force

In [1]:
from algorithms import forward_algorithm
from markov import hot, cold 

In [2]:

ALL_SEQUENCES = [
    (t1,t2,t3)
    for t1 in (hot, cold)
    for t2 in (hot, cold)
    for t3 in (hot, cold)
]
ALL_SEQUENCES

[(hot, hot, hot),
 (hot, hot, cold),
 (hot, cold, hot),
 (hot, cold, cold),
 (cold, hot, hot),
 (cold, hot, cold),
 (cold, cold, hot),
 (cold, cold, cold)]

In [3]:
observations = [3,1,3]
o1,o2, o3 = observations

In [4]:
from coefficients import B

In [5]:
likelihoods = {
    (
        B[s1][o1] * B[s2][o2] * B[s3][o3],
         (s1,s2,s3)
         
    )
    for 
    (s1,s2,s3 ) in ALL_SEQUENCES

}
 
sorted(likelihoods, key = lambda pair:pair[0], reverse=True)

[(Fraction(2, 25), (hot, cold, hot)),
 (Fraction(4, 125), (hot, hot, hot)),
 (Fraction(1, 50), (hot, cold, cold)),
 (Fraction(1, 50), (cold, cold, hot)),
 (Fraction(1, 125), (cold, hot, hot)),
 (Fraction(1, 125), (hot, hot, cold)),
 (Fraction(1, 200), (cold, cold, cold)),
 (Fraction(1, 500), (cold, hot, cold))]

In [6]:
max(likelihoods)

(Fraction(2, 25), (hot, cold, hot))

So given the observation sequence `3,1,3` the mostly likely sequence of hidden states is `hot,cold,hot`

Which is what we would expect. 'Ice cream consumption goes up on hot days'

#### Cost

Again, this is comptutationally expensive.

Instead we use the 'Virterbi Algorithm'

# Virterbi Algorithm

> The idea is to process the observation sequence lef to right, filling out the trellis.

> Each cell of the trellis, $v_t(j) represents the probability that the HMM is in state $j$ after seeing the first $t$ observations and passing through the most probable state sequence $q_1,...,q_{t-1}$, given the automaton $\lambda$


We compute values for each cell $v_t(j)$ by recursively taking the most probable path that could lead us to this cell.

$$ 

    v_t(j) = \max _{q_1,..., q_{t-1}} P(
        q_1...q_{t-1},o_1, o_2...o_t , q_t = j | \lambda)
$$

Again, this can be done recursively


$$
    v_t(j) = \max_{i=1}^N v_{t-1}(i) a_{ij} b_j (o_t)
$$

So the Virterbi probability at a time $t$ is a function of:

* $v_{t-1}(i)$ - the virtebi value in a previous cell
* $a_{ij}$ - the transition probability from that cell to this one
* $b_j{o_t} $- the likelihood of observing symbol o_t given the current state $j$

> Note that the Virterbi algorithm is identical to the forward algorithm except that it takes `max` over previous path probability rather than `sum`.
