# Moving onto temporal modelling.

**The way I want to do this is to make 2 models: One first-order Markov model for modelling the temporal relationships and then the DBN that will model the co-occurrences.**

In order to make the Markov model, we need the following steps:
1. define variables
2. discretize the variables (ie make 0/1 vars)
3. Estimate transition probabilities
4. Construct transition matrix
5. Combine transition matrices
6. Use the model for prediction

Of these, we already have 1. and 2., so we can move onto 3.

How to estimate **transition probabilities**: 
Count the number of transitions between states in dataset and divide by total number of transitions.



In [59]:
import os
import pandas as pd
import pprint 
import pickle
import numpy as np
pp = pprint.PrettyPrinter(indent=4)
from sklearn.model_selection import train_test_split
import math

In [60]:
# load data

with open("facetouch_dataframes.pickle", 'rb') as f:
    dataframes = pickle.load(f)

In [61]:
print(dataframes['/home/roni/coding/mastersProject/src/csvOut/p_100/recording_3'].columns)

Index(['Unnamed: 0', 'participant', 'frame', 'pose', 'hand_left', 'hand_right',
       'leftHandTouching', 'rightHandTouching', ' face_id', ' timestamp',
       ' confidence', ' success', ' AU01_r', ' AU02_r', ' AU04_r', ' AU05_r',
       ' AU06_r', ' AU07_r', ' AU09_r', ' AU10_r', ' AU12_r', ' AU14_r',
       ' AU15_r', ' AU17_r', ' AU20_r', ' AU23_r', ' AU25_r', ' AU26_r',
       ' AU45_r', ' AU01_c', ' AU02_c', ' AU04_c', ' AU05_c', ' AU06_c',
       ' AU07_c', ' AU09_c', ' AU10_c', ' AU12_c', ' AU14_c', ' AU15_c',
       ' AU17_c', ' AU20_c', ' AU23_c', ' AU25_c', ' AU26_c', ' AU28_c',
       ' AU45_c', 'GAD Score', 'PHQ Score', 'combinedRightHand',
       'combinedLeftHand'],
      dtype='object')


In [62]:
print(len(dataframes))
relevant_nodes = [' AU17_c', ' AU07_c',' AU14_c' , ' AU12_c',' AU20_c' , 'leftHandTouching', 'rightHandTouching' ]

50


In [63]:
# interlude: make the dfs on a 1 second time step instead of frame wise.

ft_dataframes = {}

for d in dataframes:
    df = dataframes[d]
    # convert 'timestamp' column to datetime type
    df[' timestamp'] = pd.to_datetime(df[' timestamp'], unit='s')

    # set 'timestamp' as the DataFrame index
    df = df.set_index(' timestamp')

    # group the DataFrame by 1-second time windows and calculate the mode for each relevant variable
    ft_dataframes[d] = df[relevant_nodes].groupby(pd.Grouper(freq='1S')).agg(lambda x: x.mode()[0])

In [64]:
ft_dataframes['/home/roni/coding/mastersProject/src/csvOut/p_100/recording_3'].info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 491 entries, 1970-01-01 00:00:00 to 1970-01-01 00:08:10
Freq: S
Data columns (total 7 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0    AU17_c            491 non-null    float64
 1    AU07_c            491 non-null    float64
 2    AU14_c            491 non-null    float64
 3    AU12_c            491 non-null    float64
 4    AU20_c            491 non-null    float64
 5   leftHandTouching   491 non-null    bool   
 6   rightHandTouching  491 non-null    bool   
dtypes: bool(2), float64(5)
memory usage: 24.0 KB


In [65]:
ft_dataframes['/home/roni/coding/mastersProject/src/csvOut/p_100/recording_3']['rightHandTouching'].sum()

0

In [66]:
dataframes['/home/roni/coding/mastersProject/src/csvOut/p_100/recording_3']['leftHandTouching'].sum()

3

In [67]:
# need to remove the ones that have zeroes

nonzeroes = {}

for df in ft_dataframes:
    ft_dataframes[df]['facetouch'] = (ft_dataframes[df]['leftHandTouching'] | ft_dataframes[df]['rightHandTouching']).astype(int)
    
    if ( ft_dataframes[df]['facetouch'].sum() > 0  ):
        nonzeroes[df] = ft_dataframes[df][[' AU17_c', ' AU07_c',' AU14_c' , ' AU12_c',' AU20_c','facetouch' ]]
print(len(ft_dataframes),len(nonzeroes))

50 13


In [68]:
#add the +- tau

def addTau(df):

    # identify the indices where the values change from 0 to 1 or vice versa
    indices = np.where(np.diff(df['facetouch']) != 0)[0] + 1

    # add a layer of 1s to the sequence
    new_col = np.insert(df['facetouch'].values, indices, 1)
    #print(len(df['facetouch'].tolist()))
    #print(len(new_col))
    
    df['facetouch'] = new_col[:-len(indices)]    


In [69]:
for df in nonzeroes:
    addTau(nonzeroes[df])

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['facetouch'] = new_col[:-len(indices)]


In [70]:
for df in nonzeroes: #'GAD Score', 'PHQ Score'
    nonzeroes[df]['PHQ Score'] = dataframes[df]['PHQ Score'][0]
    nonzeroes[df]['GAD Score'] = dataframes[df]['GAD Score'][0]
    

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  nonzeroes[df]['PHQ Score'] = dataframes[df]['PHQ Score'][0]
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  nonzeroes[df]['PHQ Score'] = dataframes[df]['PHQ Score'][0]
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  nonzeroes[df]['PHQ Score'] = dataframes[df]['PHQ Score'][0]
A value is trying to be s

In [71]:
nonzeroes['/home/roni/coding/mastersProject/src/csvOut/p_109'].head()

Unnamed: 0_level_0,AU17_c,AU07_c,AU14_c,AU12_c,AU20_c,facetouch,PHQ Score,GAD Score
timestamp,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
1970-01-01 00:00:00,0.0,0.0,0.0,0.0,0.0,0,17.0,15.0
1970-01-01 00:00:01,0.0,0.0,0.0,0.0,0.0,0,17.0,15.0
1970-01-01 00:00:02,0.0,0.0,0.0,0.0,0.0,0,17.0,15.0
1970-01-01 00:00:03,0.0,0.0,0.0,0.0,0.0,0,17.0,15.0
1970-01-01 00:00:04,0.0,0.0,0.0,0.0,0.0,0,17.0,15.0


In [72]:
#save the updated dataframes
with open("onesec_dataframes_unsorted.pickle", 'wb') as f:
    pickle.dump(ft_dataframes, f)
    
with open("onesec_dataframes_sorted.pickle", 'wb') as f:
    pickle.dump(nonzeroes, f)    

From the results of the correlation calculations we want to model **AU17, AU7 and AU14 for the left hand** and **AU14, AU12 and AU20 for the right hand** as those are the largest correlations.

Out of interest this correlates to the chin raiser, lid tightener, dimpler for left hand and dimpler, lip corner puller and lip stretcher for the right hand.

Need to compute the probability P(t = 0| t-1 = 1) for each of these. We also need to split into train/test/val.

In [73]:
video_keys = list(nonzeroes.keys())

# Split the keys into training, validation, and testing sets
train_keys, test_keys = train_test_split(video_keys, test_size=0.1, random_state=12)

# Create the training, validation, and testing sets as dictionaries
train_data = {key: nonzeroes[key] for key in train_keys}
test_data = {key: nonzeroes[key] for key in test_keys}

print(len(train_data),len(test_data))

11 2


In [74]:
print(test_data)

# test files:
# '/home/roni/coding/mastersProject/src/csvOut/p_70/recording_1'
# '/home/roni/coding/mastersProject/src/csvOut/p_73/recording_4'

{'/home/roni/coding/mastersProject/src/csvOut/p_70/recording_0':                       AU17_c   AU07_c   AU14_c   AU12_c   AU20_c  facetouch  \
 timestamp                                                                    
1970-01-01 00:00:00      1.0      0.0      0.0      1.0      0.0          0   
1970-01-01 00:00:01      1.0      0.0      1.0      1.0      0.0          0   
1970-01-01 00:00:02      1.0      0.0      1.0      1.0      0.0          0   
1970-01-01 00:00:03      1.0      0.0      1.0      1.0      0.0          0   
1970-01-01 00:00:04      1.0      0.0      0.0      0.0      1.0          0   
...                      ...      ...      ...      ...      ...        ...   
1970-01-01 00:05:58      0.0      0.0      0.0      0.0      1.0          0   
1970-01-01 00:05:59      0.0      0.0      0.0      0.0      1.0          0   
1970-01-01 00:06:00      0.0      0.0      0.0      0.0      1.0          0   
1970-01-01 00:06:01      0.0      0.0      0.0      0.0      1.0  

In [75]:
# at this point i will combine all the train videos into one which is gonna feed into the model

big_df = pd.concat(train_data.values())

big_df.info()

with open("train_data_combined.pickle", 'wb') as f:
    pickle.dump(big_df, f)


<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 5207 entries, 1970-01-01 00:00:00 to 1970-01-01 00:07:46
Data columns (total 8 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0    AU17_c    5207 non-null   float64
 1    AU07_c    5207 non-null   float64
 2    AU14_c    5207 non-null   float64
 3    AU12_c    5207 non-null   float64
 4    AU20_c    5207 non-null   float64
 5   facetouch  5207 non-null   int64  
 6   PHQ Score  5207 non-null   float64
 7   GAD Score  5207 non-null   float64
dtypes: float64(7), int64(1)
memory usage: 366.1 KB


In [76]:
transition_cols = [' AU17_c', ' AU07_c',' AU14_c' , ' AU12_c',' AU20_c' , 'facetouch' ]
transition_probs = { df:{ key:[0,0,0,0,0,0] for key in transition_cols } for df in train_data} # [ 1 to 0, total 1, 0 to 1, total 0, 0 to 0 , 1 to 1]

In [77]:
for df in train_data:
    for i in transition_cols:
        #if( True not in train_data[df][i].unique() ):
            #continue
        diffs = train_data[df][i].astype(int).diff()
        positive_transitions = (diffs == 1).sum() # this is 0 to 1
        negative_transitions = (diffs == -1).sum() # this is 1 to 0
        zero_to_zero = ((train_data[df][i] == 0) & (diffs == 0)).sum()
        one_to_one = ((train_data[df][i] == 1) & (diffs == 0)).sum()
        zeroes = (train_data[df][i] == 0).sum()
        ones = (train_data[df][i] == 1).sum()
        
        #print(i, positive_transitions, ones, negative_transitions, zeroes)
        transition_probs[df][i][0] += negative_transitions
        transition_probs[df][i][1] += ones
        transition_probs[df][i][2] += positive_transitions
        transition_probs[df][i][3] += zeroes
        transition_probs[df][i][5] += one_to_one
        transition_probs[df][i][4] += zero_to_zero
        
        
        
    

In [78]:
print(transition_probs)

{'/home/roni/coding/mastersProject/src/csvOut/p_73/recording_4': {' AU17_c': [2, 2, 2, 59, 56, 0], ' AU07_c': [0, 0, 0, 61, 60, 0], ' AU14_c': [0, 0, 0, 61, 60, 0], ' AU12_c': [0, 0, 0, 61, 60, 0], ' AU20_c': [2, 2, 2, 59, 56, 0], 'facetouch': [1, 4, 1, 57, 55, 3]}, '/home/roni/coding/mastersProject/src/csvOut/p_105/recording_2': {' AU17_c': [11, 73, 11, 69, 57, 62], ' AU07_c': [9, 67, 9, 75, 65, 58], ' AU14_c': [9, 78, 10, 64, 54, 68], ' AU12_c': [14, 48, 14, 94, 79, 34], ' AU20_c': [14, 34, 14, 108, 93, 20], 'facetouch': [3, 12, 3, 130, 126, 9]}, '/home/roni/coding/mastersProject/src/csvOut/p_112': {' AU17_c': [139, 304, 140, 525, 385, 164], ' AU07_c': [37, 59, 37, 770, 732, 22], ' AU14_c': [119, 514, 120, 315, 195, 394], ' AU12_c': [109, 266, 109, 563, 453, 157], ' AU20_c': [40, 56, 40, 773, 732, 16], 'facetouch': [2, 8, 2, 821, 818, 6]}, '/home/roni/coding/mastersProject/src/csvOut/p_52/recording_0': {' AU17_c': [32, 324, 33, 503, 470, 291], ' AU07_c': [84, 184, 84, 643, 558, 100],

In [79]:
probabilities =  { df:{ key:[] for key in transition_cols } for df in train_data}
for df in transition_probs:
    for i in transition_probs[df]:
        t = transition_probs[df][i]
        #print(i, t)
        if(t[1]==0):
            a = 0
            d = 0
        else: 
            d = t[5]/t[1]
            a = t[0]/t[1]
            
        if(t[3]==0):
            b = 0
            c = 0
        else:
            b = t[2]/t[3]
            c = t[4]/t[3]
        
        probabilities[df][i] = [ a,  b, c, d ]
    
pp.pprint(probabilities)

{   '/home/roni/coding/mastersProject/src/csvOut/p_105/recording_0': {   ' AU07_c': [   0.11858974358974358,
                                                                                        0.2753623188405797,
                                                                                        0.7246376811594203,
                                                                                        0.8782051282051282],
                                                                         ' AU12_c': [   0.3881578947368421,
                                                                                        0.19463087248322147,
                                                                                        0.802013422818792,
                                                                                        0.6118421052631579],
                                                                         ' AU14_c': [   0.17557251908396945,
                        

In [80]:
#for df in probabilities:
#    for i in probabilities[df]:
#        for j in probabilities[df][i]:
#            if( math.isnan(j)==True ):
#                j=0

In [81]:
# now we make the transition matrix

# Define the states
states = [0, 1]
matrices = {df:{ key: np.zeros((len(states), len(states))) for key in transition_cols } for df in train_data}
#print(matrices)
## Initialize the transition matrix with zeros
#transition_matrix = np.zeros((len(states), len(states)))
#print(matrices)
for df in matrices:
    for m in matrices[df]:
        transition_matrix = matrices[df][m]
        # Set the transition probabilities
        
        transition_matrix[0, 0] = probabilities[df][m][2] # 0 to 0
        transition_matrix[0, 1] = probabilities[df][m][1] # 0 to 1
        transition_matrix[1, 0] = probabilities[df][m][0] # 1 to 0
        transition_matrix[1, 1] = probabilities[df][m][3] # 1 to 1
    

        # Print the transition matrix
        print(transition_matrix)


[[0.94915254 0.03389831]
 [1.         0.        ]]
[[0.98360656 0.        ]
 [0.         0.        ]]
[[0.98360656 0.        ]
 [0.         0.        ]]
[[0.98360656 0.        ]
 [0.         0.        ]]
[[0.94915254 0.03389831]
 [1.         0.        ]]
[[0.96491228 0.01754386]
 [0.25       0.75      ]]
[[0.82608696 0.15942029]
 [0.15068493 0.84931507]]
[[0.86666667 0.12      ]
 [0.13432836 0.86567164]]
[[0.84375    0.15625   ]
 [0.11538462 0.87179487]]
[[0.84042553 0.14893617]
 [0.29166667 0.70833333]]
[[0.86111111 0.12962963]
 [0.41176471 0.58823529]]
[[0.96923077 0.02307692]
 [0.25       0.75      ]]
[[0.73333333 0.26666667]
 [0.45723684 0.53947368]]
[[0.95064935 0.04805195]
 [0.62711864 0.37288136]]
[[0.61904762 0.38095238]
 [0.23151751 0.76653696]]
[[0.80461812 0.19360568]
 [0.40977444 0.59022556]]
[[0.9469599  0.05174644]
 [0.71428571 0.28571429]]
[[0.99634592 0.00243605]
 [0.25       0.75      ]]
[[0.93439364 0.06560636]
 [0.09876543 0.89814815]]
[[0.86780715 0.13063764]
 [0.45

In [82]:
pp.pprint(matrices)

{   '/home/roni/coding/mastersProject/src/csvOut/p_105/recording_0': {   ' AU07_c': array([[0.72463768, 0.27536232],
       [0.11858974, 0.87820513]]),
                                                                         ' AU12_c': array([[0.80201342, 0.19463087],
       [0.38815789, 0.61184211]]),
                                                                         ' AU14_c': array([[0.75531915, 0.24468085],
       [0.17557252, 0.82061069]]),
                                                                         ' AU17_c': array([[0.75115207, 0.24884793],
       [0.23175966, 0.7639485 ]]),
                                                                         ' AU20_c': array([[0.89825581, 0.10174419],
       [0.32075472, 0.66981132]]),
                                                                         'facetouch': array([[0.99092971, 0.00680272],
       [0.33333333, 0.66666667]])},
    '/home/roni/coding/mastersProject/src/csvOut/p_105/recording_2': {   ' AU07_c': a

In [83]:
for df in matrices:
    for m in matrices[df]:
        for i in matrices[df][m]:
            if (m=='facetouch'):
                print(m,i, i[0]+i[1],'\n')

facetouch [0.96491228 0.01754386] 0.9824561403508771 

facetouch [0.25 0.75] 1.0 

facetouch [0.96923077 0.02307692] 0.9923076923076923 

facetouch [0.25 0.75] 1.0 

facetouch [0.99634592 0.00243605] 0.9987819732034104 

facetouch [0.25 0.75] 1.0 

facetouch [0.99634592 0.00243605] 0.9987819732034104 

facetouch [0.33333333 0.66666667] 1.0 

facetouch [0.99092971 0.00680272] 0.9977324263038548 

facetouch [0.33333333 0.66666667] 1.0 

facetouch [0.99842271 0.00157729] 1.0 

facetouch [0.         0.66666667] 0.6666666666666666 

facetouch [0.9982238 0.0017762] 1.0 

facetouch [0.         0.66666667] 0.6666666666666666 

facetouch [0.99242424 0.00505051] 0.9974747474747475 

facetouch [0.33333333 0.66666667] 1.0 

facetouch [0.98734177 0.01265823] 1.0 

facetouch [0. 0.] 0.0 

facetouch [0.99033149 0.00828729] 0.9986187845303868 

facetouch [0.27272727 0.72727273] 1.0 

facetouch [0.9841629  0.01357466] 0.9977375565610859 

facetouch [0.24 0.76] 1.0 



In [84]:
#save the updated dataframes
with open("markov_probs.pickle", 'wb') as f:
    pickle.dump(matrices, f)

In [85]:
# ideally also need to keep the test set as pickles as well
#save the updated dataframes
with open("test_dfs.pickle", 'wb') as f:
    pickle.dump(test_data, f)

# Conditional Probabilities Calculations

In order to calculate the conditional probabilities, we assume that AU and FT are statistically independent and thus P(AU and FT) = P(AU)*P(FT).
We do this for every video, for every combination of AU and FT (should be 10).

In [86]:
cps = [ 'FT AU17_c', 'FT AU07_c','FT AU14_c' , 'FT AU12_c','FT AU20_c' ]

cps_matrices = {df:{ key: np.zeros((len(states), len(states))) for key in cps } for df in train_data}

for df in matrices:
    ft = matrices[df]['facetouch']

    for m in matrices[df]:
        t = matrices[df][m]
        # Set the transition probabilities
        #print('LH' + m)
        
        if('FT' + m in cps_matrices[df]):
            #add to lh
            #print('yay')
            cps_matrices[df]['FT' + m] = np.array([[ t[0,0]*ft[0,0], t[0,1]*ft[0,1] ],[ t[1,0]*ft[1,0], t[1,1]*ft[1,1] ]])

            # Print the transition matrix
            #print(cps_matrices[df]['LH' + m])


In [87]:
#save the updated dataframes
with open("conditional_probs.pickle", 'wb') as f:
    pickle.dump(cps_matrices, f)