# Markov transitioner. Attempt 1.

This notebook uses the markov models implemented in the markovmodel.py library to find the most likely path through a transitioning network.

That is, if we have, say, 10 steps, and are using two MMs, MM1 and MM2, then the first step will use a MM that's 90% MM1 and 10% MM2, then 80% MM1 and 20% MM2 and so on.

The aim is to try to find smooth transitions between sequential data sources.

In [1]:
import markovmodels as mm

Now, let's define a random MM which uses the states in {'a', 'b', 'c', 'd', 'e'}:

In [2]:
from random import randint

In [6]:
states_ls=['a', 'b', 'c', 'd', 'e']

test1_mm=mm.MarkovModel([(states_ls[randint(0, len(states_ls)-1)],
                          states_ls[randint(0, len(states_ls)-1)])
                         for i in range(20)],
                        [states_ls[randint(0, len(states_ls)-1)]
                         for i in range(10)]
                        )
test1_mm.transitionMatrix_df

Unnamed: 0,a,b,c,d,e
a,0.333333,0.166667,0.0,0.166667,0.333333
b,0.166667,0.166667,0.333333,0.333333,0.0
c,0.666667,0.0,0.0,0.333333,0.0
d,1.0,0.0,0.0,0.0,0.0
e,0.0,0.0,0.0,0.5,0.5


And another random MM which uses the states in {'c', 'd', 'e', 'f', 'g'}:

In [7]:
states_ls=['c', 'd', 'e', 'f', 'g']

test2_mm=mm.MarkovModel([(states_ls[randint(0, len(states_ls)-1)],
                          states_ls[randint(0, len(states_ls)-1)])
                         for i in range(20)],
                        [states_ls[randint(0, len(states_ls)-1)]
                         for i in range(10)]
                        )
test2_mm.transitionMatrix_df

Unnamed: 0,c,d,e,f,g
c,0.5,0.166667,0.0,0.166667,0.166667
d,0.0,0.333333,0.333333,0.0,0.333333
e,0.333333,0.333333,0.0,0.166667,0.166667
f,0.0,0.0,0.5,0.5,0.0
g,0.0,0.333333,0.0,0.666667,0.0


So we can go from, say, state 'a' to state 'd' in MM1:

In [8]:
# Let's try it in 5 steps
test1_mm.apply(['a'], 5).most_likely_path('d')

['a', 'e', 'd', 'a', 'e', 'd']

and state 'd' to state 'g' in MM2:

In [9]:
# Let's try it in 5 steps
test2_mm.apply(['d'], 5).most_likely_path('g')

['d', 'g', 'f', 'e', 'd', 'g']

but 'a' is not in MM2, and 'g' is not in MM1, so we need to combine them. And we can do that with a number of steps. Let's assume we're using 10.

Start with a 100% of MM1:

In [10]:
merge_mm=mm.merge(test1_mm, test2_mm, 1)
s1=merge_mm.apply(['a'])
s1.get_current_state_distribution()

a    0.333333
b    0.166667
c    0.000000
d    0.166667
e    0.333333
f    0.000000
g    0.000000
dtype: float64

Next do 90% of MM1:

In [11]:
merge_mm=mm.merge(test1_mm, test2_mm, 0.9)
s2=merge_mm.apply(s1)
s2.get_current_state_distribution()

a    0.150000
b    0.050000
c    0.050000
d    0.161111
e    0.150000
f    0.005556
g    0.005556
dtype: float64

Next do 80% of MM1:

In [12]:
merge_mm=mm.merge(test1_mm, test2_mm, 0.8)
s3=merge_mm.apply(s2)
s3.get_current_state_distribution()

a    0.128889
b    0.020000
c    0.013333
d    0.070000
e    0.060000
f    0.005000
g    0.010741
dtype: float64

Next do 70% of MM1:

In [13]:
merge_mm=mm.merge(test1_mm, test2_mm, 0.7)
s4=merge_mm.apply(s3)
s4.get_current_state_distribution()

a    0.049000
b    0.015037
c    0.006000
d    0.027000
e    0.030074
f    0.003000
g    0.007000
dtype: float64

and so on...

In [14]:
merge_mm=mm.merge(test1_mm, test2_mm, 0.6)
s5=merge_mm.apply(s4)
s5.get_current_state_distribution()

a    0.016200
b    0.004900
c    0.004010
d    0.013032
e    0.009800
f    0.002005
g    0.003600
dtype: float64

In [15]:
merge_mm=mm.merge(test1_mm, test2_mm, 0.5)
s6=merge_mm.apply(s5)
s6.get_current_state_distribution()

a    0.006516
b    0.001350
c    0.001633
d    0.004083
e    0.002700
f    0.001200
g    0.002172
dtype: float64

In [16]:
merge_mm=mm.merge(test1_mm, test2_mm, 0.4)
s7=merge_mm.apply(s6)
s7.get_current_state_distribution()

a    0.001633
b    0.000434
c    0.000540
d    0.001080
e    0.000869
f    0.000869
g    0.000817
dtype: float64

In [17]:
merge_mm=mm.merge(test1_mm, test2_mm, 0.3)
s8=merge_mm.apply(s7)
s8.get_current_state_distribution()

a    0.000324
b    0.000082
c    0.000203
d    0.000333
e    0.000304
f    0.000381
g    0.000252
dtype: float64

In [18]:
merge_mm=mm.merge(test1_mm, test2_mm, 0.2)
s9=merge_mm.apply(s8)
s9.get_current_state_distribution()

a    0.000067
b    0.000011
c    0.000081
d    0.000111
e    0.000152
f    0.000152
g    0.000089
dtype: float64

In [19]:
merge_mm=mm.merge(test1_mm, test2_mm, 0.1)
s10=merge_mm.apply(s9)
s10.get_current_state_distribution()

a    0.000011
b    0.000001
c    0.000046
d    0.000053
e    0.000069
f    0.000069
g    0.000033
dtype: float64

In [20]:
merge_mm=mm.merge(test1_mm, test2_mm, 0)
s11=merge_mm.apply(s10)
s11.get_current_state_distribution()

a    0.000000
b    0.000000
c    0.000023
d    0.000023
e    0.000034
f    0.000034
g    0.000018
dtype: float64

In [21]:
s11.get_current_state_distribution()

a    0.000000
b    0.000000
c    0.000023
d    0.000023
e    0.000034
f    0.000034
g    0.000018
dtype: float64

Now, if we get the path to 'g', then we'd hope to see more of the MM1 states in the first half, and more of the MM2 states in the second half:

In [22]:
s11.most_likely_path('g')

['a', 'e', 'e', 'd', 'a', 'e', 'd', 'g', 'f', 'e', 'd', 'g']

Wow... that might have actually worked.

So... say we want to make the transition in 10 steps. Actually, we want 11, for the cases [0, 0.1, 0.2, ..., 0.9, 1.0]

So for *n* steps, want steps:

In [27]:
numSteps_i=10

transitionWeights_ls=[1-(x/numSteps_i) for x in range(numSteps_i+1)]

transitionWeights_ls

[1.0,
 0.9,
 0.8,
 0.7,
 0.6,
 0.5,
 0.4,
 0.30000000000000004,
 0.19999999999999996,
 0.09999999999999998,
 0.0]

And to calculate the path, want:

In [28]:
# Starting with a MM in test1_mm
# A second MM in test2_mm
# A number of steps in numSteps_i

numSteps_i=10

transitionWeights_ls=[1-(x/numSteps_i) for x in range(numSteps_i+1)]

# Start with an initial state. Here, use 'a':

merged_mm=mm.merge(test1_mm, test2_mm, 1)
state_ms=merged_mm.create_markov_state(['a'])

# Now do the rest of the cases:

for weight_f in transitionWeights_ls:
    merged_mm=mm.merge(test1_mm, test2_mm, weight_f)
    state_ms=merged_mm.apply(state_ms)

# And find most likely path to 'g':
state_ms.most_likely_path('g')

['a', 'e', 'e', 'd', 'a', 'e', 'd', 'g', 'f', 'e', 'd', 'g']

In [None]:
test1_mm.transitionMatrix_ar

OK, that seems to be working...

Now to try it on some real data.