# Modern Data Science 
**(Module 03: Pattern Classification)**

---
- Materials in this module include resources collected from various open-source online repositories.
- You are free to use, change and distribute this package.

Prepared by and for 
**Student Members** |
2006-2018 [TULIP Lab](http://www.tulip.org.au), Australia

---


# Session D - Hidden Markov Model: HMM

In this sesion, we will study Hidden Markov Model.The Hidden Markov Model based on the Markov Model and solve the limitations of Markov Model. We will show the formalism and example of HMM.

In [None]:
# import related package
from IPython.core.display import HTML
import requests
import numpy as np
import pandas as pd
p=print

## Is Your Girlfriend Cheating on You?

Your girlfriend Mary has been exhibiting odd sequences of behavior. You suspect cheating. You don't want to falsely accuse her but something is off. Is there a way to reason about the situation so you can decide if you should confront her using the limited information you have?

You recall Hidden Markov Models are a tool for these situations. Mary's activities represent a sequence of **`hidden states`**, while her observed behavior represents a sequence of **`emissions`**. Because you don't talk much and you believe she may lie to you, all you can do is try to guess her true state via observations taken over time. 

First you define the range of possible states **`M`**. You know Mary has a strong work ethic, both professionally and actively. So you conclude that when not with you, she could be at __`Work, Gym,`__ or __`Cheating`__. 

These states are hidden to you and cannot be observed directly.

You do not know the initial probabilities **`pi`**, of which state she could be in. You decide that Work or the Gym is equiprobable, and that there is a small percent she is cheating.

>**pi = [0.4, 0.4, 0.2]**

In [None]:
# define states
# work --> gym --> cheating

# initial probability of being in state k, for M states
states = ['work', 'gym', 'cheating']
pi = np.array([0.4, 0.4, 0.2])

# pi 
(pd.Series(pi, index=states, name='states'))

Next you must guess about the transition probabilities for the matrix of possible states. For example, if Mary was working, what is the probability she would continue working, then transition to the gym, then transition to cheating? 

These are difficult questions to ask, no doubt, but you push forward.

You start with the state transitions for work. You reason that the probability is high, that Mary would keep working given she is already working. It is also highly probable she could transition from work to the gym. You assign a low probability that she could transition from work to cheating.

>**work = [0.6, 0.3, 0.1]**

The gym is less certain. You reason she could transition from the gym state to any other state with equal probability.

>**gym = [0.33, 0.33, 0.33]**

Finally, you consider if she were cheating, that you have no idea what state she would transition to afterwards. 

>**cheating = [0.33, 0.33, 0.33]**

In [None]:
# a or alpha = transition probability matrix of changing states given a state
# matrix is size (M x M) where M is number of states

a_df = pd.DataFrame(columns=states, index=states)
a_df.loc[states[0]] = [.6, .3, .1]
a_df.loc[states[1]] = [.33, .33, .33]
a_df.loc[states[2]] = [.33, .33, .33]
p(a_df)

a = a_df.values
p('\n',a, a.shape)

The final requirement is to reason about the observation aka _-`emission`__ probabilities. These are the probabilities that you would observe a particular behavior given she is in a particular state.

Again you do not know Mary's true states because you don't talk, and you believe she may lie to you. Instead you focus on observations you believe are linked to her true state. 

These observations are __`makeup, athletic dress`__, and __`locked cell phone.`__ 

Given Mary is in the work state, it is highly probable that she would wear makeup to work, very low probability that she would dress athletically, and high probability she would lock her phone.

>__work_emission = [0.4, 0.1, 0.5]__

Mary is an avid gym goer. Given the gym state, she is unlikely to wear makeup, likely to dress athletically, and is very likely to lock her phone.

>__gym_emission = [0.1, 0.3, 0.6]__

If she is cheating, you figure you clearly don't know Mary like you thought, and you certainly do not know the probability that she will emit any of these behaviors if she is cheating therefore you set them equiprobable. 

>__cheating_emission = [0.33, 0.33, 0.33]__ 

In [None]:
# Emission probabilities
# b or beta = observation probabilities given state
# matrix is size (M x O) where M is number of states 
# and O is number of different possible observations

emit = ['makeup', 'dress', 'phone']
b_df = pd.DataFrame(columns=emit, index=states)
b_df.loc[states[0]] = [0.4, 0.1, 0.5]
b_df.loc[states[1]] = [0.1, 0.3, 0.6]
b_df.loc[states[2]] = [0.33, 0.33, 0.33]
p(b_df)

b = b_df.values
p('\n', b, b.shape)

Now we simply record the observation sequence

In [None]:
# observation sequence of Mary's behaviors
# observations are encoded numerically

obs_map = {'makeup':0, 'dress':1, 'phone':2}
obs = np.array([0,2,2,1,1,0,2,1,2,1,1,2,0,2])

inv_obs_map = dict((v,k) for k, v in obs_map.items())
obs_seq = [inv_obs_map[v] for v in list(obs)]

p( pd.DataFrame(np.column_stack([obs, obs_seq]), 
                columns=['Obs_code', 'Obs_seq']) )

### The HMM can answer the question, _given this sequence of observed behaviors and our model parameters, what is the most likely sequence of hidden states?_

You can calculate this using the __`Viterbi`__ algorithm.

High level, the Viterbi algorithm increments over each time step, finding the __`maximum`__ probability of any path that gets to state __`i`__ at time __`t`__, that ___also___ has the correct observations for the sequence up to time __`t`__.

The algorithm also keeps track of the state with the highest probability at each stage. At the end of the sequence, the algorithm will iterate backwards selecting the state that "won" each time step, and thus creating the most likely path, or likely sequence of hidden states that led to the sequence of observations. 

In [None]:
# define Viterbi algorithm for shortest path


def viterbi(pi, a, b, obs):
    
    nStates = np.shape(b)[0]
    T = np.shape(obs)[0]
    
    # init blank path
    path = np.zeros(T)
    # delta --> highest probability of any path that reaches state i
    delta = np.zeros((nStates, T))
    # phi --> argmax by time step for each state
    phi = np.zeros((nStates, T))
    
    # init delta and phi 
    delta[:, 0] = pi * b[:, obs[0]]
    phi[:, 0] = 0

    p('\nStart Walk Forward\n')    
    # the forward algorithm extension
    for t in range(1, T):
        for s in range(nStates):
            delta[s, t] = np.max(delta[:, t-1] * a[:, s]) * b[s, obs[t]] 
            phi[s, t] = np.argmax(delta[:, t-1] * a[:, s])
            p('s={s} and t={t}: phi[{s}, {t}] = {phi}'.format(s=s, t=t, phi=phi[s, t]))
    
    # find optimal path
    p('-'*50)
    p('Start Backtrace\n')
    path[T-1] = np.argmax(delta[:, T-1])
    #p('init path\n    t={} path[{}-1]={}\n'.format(T-1, T, path[T-1]))
    for t in range(T-2, -1, -1):
        path[t] = phi[path[t+1], [t+1]]
        #p(' '*4 + 't={t}, path[{t}+1]={path}, [{t}+1]={i}'.format(t=t, path=path[t+1], i=[t+1]))
        p('path[{}] = {}'.format(t, path[t]))
        
    return path, delta, phi

path, delta, phi = viterbi(pi, a, b, obs)
p('\nsingle best state path: \n', path)
p('delta:\n', delta)
p('phi:\n', phi)
p('this delta',delta)

In [None]:
(pd.DataFrame(delta, index=states).T)

In [None]:
state_map = {0:'work', 1:'gym', 2:'cheating'}
state_path = [state_map[v] for v in path]

(pd.DataFrame()
 .assign(Observation=obs_seq)
 .assign(Best_Path=state_path))

### Conclusion

Using the Hidden Markov Model framework and some reasonable assumptions, we were able to make an educated guess about Mary's true sequence of states without direct observation of those states. Instead we used the directly observable emmissions as a link to map the observable to the hidden. 