# Data Processing

_This notebook will take in a sample dataset compiled from the database, consisting of both biological sensor data and mobile app data called *sample.csv*_

## Store variables into a list

_in this section, we want to read each column of the sample dataset, which represents one variable_

In [74]:
from pandas import *
  
# reading CSV file
data = read_csv("sample.csv")
  
# converting column data to list
HR = data['Heartrate_variability'].tolist()
RP = data['Respiration_rate'].tolist()
SC = data['Skin_conductance'].tolist()
AG = data['Actigraphy'].tolist()
VB = data['Verbal'].tolist()
SO = data['Social'].tolist()


overall = [HR,RP,SC,AG,VB,SO]
print(overall)

[[0.9, 0.55, 0.28, 0.8, 0.92, 0.1], [34, 24, 44, 33, 36, 35], [1.4, 2.2, 2.0, 3.2, 1.1, 2.1], [80, 90, 20, 20, 33, 25], [4, 4, 4, 3, 2, 1], [4, 3, 3, 3, 2, 3]]


## Binarize the data
_Using standard deviation to binarize the data. Those lie outside of the average plus or minus standard deviation will be treated as abnormal._

In [68]:
def std_list(test_list):
    mean = sum(test_list) / len(test_list)
    variance = sum([((x - mean) ** 2) for x in test_list]) / len(test_list)
    res = variance ** 0.5
    return [mean,res]

mean_std_overall = []
for list in overall:
    mean_std_overall.append(std_list(list))
    
print(mean_std_overall)

[[0.5916666666666667, 0.3127521205186128], [34.333333333333336, 5.849976258261415], [2.0, 0.6658328118479394], [44.666666666666664, 28.992336152086047], [3.0, 1.1547005383792515], [3.0, 0.5773502691896257]]


In [70]:
def binarize(test_list,stats_list):
    bin_list = []
    for i in range(len(test_list)):
        if test_list[i] > stats_list[0]+stats_list[1] or test_list[i] < stats_list[0]-stats_list[1]:
            bin_list.append(1) # False
        else:
            bin_list.append(0) # True
    return bin_list

print(overall[1])
print(mean_std_overall[1])
print(binarize(overall[1],mean_std_overall[1]))

overall_bin = []
for k in range(len(overall)):
    overall_bin.append(binarize(overall[k],mean_std_overall[k]))

[34, 24, 44, 33, 36, 35]
[34.333333333333336, 5.849976258261415]
[0, 1, 1, 0, 0, 0]


In [72]:
print(overall_bin)

[[0, 0, 0, 0, 1, 1], [0, 1, 1, 0, 0, 0], [0, 0, 0, 1, 1, 0], [1, 1, 0, 0, 0, 0], [0, 0, 0, 0, 0, 1], [1, 0, 0, 0, 1, 0]]


## Conditional Probability 
_Calculate conditional probability by simply counting frequencies. Take example of "Verbal activity" conditioned on "heart rate variability"_

_Try "Social behaviors" conditioned on "skin conductance" and "actigraphy"_

In [98]:
Verbal_HR = sum(overall_bin[4])/len(overall_bin[4])*sum(overall_bin[0])/len(overall_bin[0])
# verbal is the 4th parameter, and heart rate is the first parameter
print("Probability of having abnormal verbal activity given probability of having abnormal heart rate variability is", Verbal_HR)

Facial_SC_AG = sum(overall_bin[5])/len(overall_bin[5])*(len(overall_bin[3])-sum(overall_bin[3]))/len(overall_bin[3])*sum(overall_bin[2])/len(overall_bin[2])
print("Probability of having abnormal social activity given probability of having normal actigraphy and abnormal skin conductance ", Facial_SC_AG)

Probability of having abnormal verbal activity given probability of having abnormal heart rate variability is 0.05555555555555555
Probability of having abnormal social activity given probability of having normal actigraphy and abnormal skin conductance  0.07407407407407407
