Collecting data for my bayesian network project has been a very challenging task. I first attempted to obtain medical records by calling and visiting several medical institutions in the metro-Atlanta area, but I soon found out the the release of (anonymous) medical records to the general public is illegal, and telling people that you're a student at the Georiga Institute of Technology will only get you so much. 

So, because I refuse to put this project to bed because of insufficient resources, I have chosen to generate my own data in order to represent patients with psychiatric disorders. The data that I am generating will be random and in no way will it be an accurate representation of the real-world population, however the purpose of this project is to build a BBN (Beautiful Bayesian Network) that is capable of handling real-world data when it is blessed enough to recieve some. 

I have 4 main psychiatric diseases that I am focusing on -Bipolar Disorder (Manic Depressive Psychosis), Depressive Disorder, Mixed Dementia, Schizophrenia. Through my research I have identified a number of causes and effects that have to do with these diseases. 

The data that I will be generating will be simple - true or false (1 or 0) values that will represent gender and if the patient is experiencing the listed cause/effect, and numerical values that will represent how many of something a patient has i.e. number of parents with the disease. 

After generating this data, go through it and calculate all of the probabilities for the various diseases, causes, effects, and use these statistics in order to train my bayesian network model. 

In [39]:
import pandas as pd
import random
from numpy.random import randint
import numpy as np

In [34]:
#Create a table that will hold all of the data

data = pd.DataFrame(columns = {"Chronicle Depression, Intercurrenced", "Elevated Stress Level", "Recent Birth",
                              "Unwanted Incident", "Genetic Influence", "Abusive use of HBP's, Sedatives, Contraception Pills",
                              "Toxins in Working Environment", "Scarcity of Phosphate and B12", "Taedium Vitae",
                              "Inquietude or Anxiety", "Social Recession/ Impulse Reduction", "Wariness/Memory Reduction",
                              "Disorientation", "Behavior Disorders", "Mania/Hallucinosis", "Personality/Emotional Life Deterioration",
                              "Social Life Detorioration", "Grimaces, Mannerisms, Puerility"})

In [35]:
data 

Unnamed: 0,Behavior Disorders,Recent Birth,Wariness/Memory Reduction,Elevated Stress Level,Scarcity of Phosphate and B12,Inquietude or Anxiety,Toxins in Working Environment,Personality/Emotional Life Deterioration,Mania/Hallucinosis,Taedium Vitae,Disorientation,Social Recession/ Impulse Reduction,Social Life Detorioration,"Chronicle Depression, Intercurrenced",Genetic Influence,"Abusive use of HBP's, Sedatives, Contraception Pills",Unwanted Incident,"Grimaces, Mannerisms, Puerility"


In [5]:
print(len(data.columns))

18


In [36]:
#Create a loop that will generate the data for the table and append the data to the table
#I need to create a nested for loop. The outside loop will run 1,000 times, being that I want to generate 1,000 lines of data
#   The inside loop will run 18 times, being that I have 18 categories in a single row that need to have values.

#Create a single empty list that the values will be stored in. This list will be reset to empty every time the inside loop
#   is done iterating, so as to reset itself for the next row

for i in range(1000):
    data.loc[i] = list(randint(2, size=18))
 


In [37]:
#admire the data
data

Unnamed: 0,Behavior Disorders,Recent Birth,Wariness/Memory Reduction,Elevated Stress Level,Scarcity of Phosphate and B12,Inquietude or Anxiety,Toxins in Working Environment,Personality/Emotional Life Deterioration,Mania/Hallucinosis,Taedium Vitae,Disorientation,Social Recession/ Impulse Reduction,Social Life Detorioration,"Chronicle Depression, Intercurrenced",Genetic Influence,"Abusive use of HBP's, Sedatives, Contraception Pills",Unwanted Incident,"Grimaces, Mannerisms, Puerility"
0,0,0,0,1,1,0,1,1,0,0,0,1,0,0,0,1,1,1
1,1,1,1,1,0,0,1,1,0,1,0,1,1,0,1,0,0,1
2,0,1,1,1,0,0,0,0,1,0,0,1,1,0,1,1,0,0
3,1,0,1,0,1,1,1,0,0,0,0,1,1,1,0,1,1,1
4,1,0,0,0,1,0,0,1,1,0,1,1,0,0,0,1,0,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
995,0,0,1,1,1,1,0,1,1,1,1,0,0,1,1,0,0,1
996,1,1,0,0,1,1,0,1,0,0,1,0,1,1,1,1,1,1
997,1,1,1,0,1,0,0,1,1,0,0,1,1,0,0,1,0,1
998,0,1,1,1,0,0,0,1,1,1,0,0,0,1,0,0,0,1


In [50]:
# Now I need to generate the diseases that the patients have
# simple array 
names = np.array(['Manic Depressive Psychosis', 'Depressive Disorder', 'Mixed Dementia', 'Schizophrenia']) 

random.choice(names)

'Manic Depressive Psychosis'

In [51]:
disorder_list = []
for i in range(1000):
    disorder_list.append(random.choice(names))

disorder_list

['Schizophrenia',
 'Depressive Disorder',
 'Depressive Disorder',
 'Manic Depressive Psychosis',
 'Depressive Disorder',
 'Mixed Dementia',
 'Schizophrenia',
 'Depressive Disorder',
 'Schizophrenia',
 'Depressive Disorder',
 'Depressive Disorder',
 'Mixed Dementia',
 'Depressive Disorder',
 'Schizophrenia',
 'Schizophrenia',
 'Mixed Dementia',
 'Manic Depressive Psychosis',
 'Manic Depressive Psychosis',
 'Mixed Dementia',
 'Depressive Disorder',
 'Depressive Disorder',
 'Depressive Disorder',
 'Mixed Dementia',
 'Schizophrenia',
 'Manic Depressive Psychosis',
 'Manic Depressive Psychosis',
 'Mixed Dementia',
 'Mixed Dementia',
 'Mixed Dementia',
 'Depressive Disorder',
 'Schizophrenia',
 'Depressive Disorder',
 'Depressive Disorder',
 'Mixed Dementia',
 'Schizophrenia',
 'Manic Depressive Psychosis',
 'Depressive Disorder',
 'Mixed Dementia',
 'Manic Depressive Psychosis',
 'Manic Depressive Psychosis',
 'Manic Depressive Psychosis',
 'Manic Depressive Psychosis',
 'Depressive Disorde

In [54]:
#Add the disorder column to the dataframe
data.insert(loc=0, column='Patient Diagnosis', value=disorder_list)

ValueError: cannot insert Patient Diagnosis, already exists

In [55]:
data

Unnamed: 0,Patient Diagnosis,Behavior Disorders,Recent Birth,Wariness/Memory Reduction,Elevated Stress Level,Scarcity of Phosphate and B12,Inquietude or Anxiety,Toxins in Working Environment,Personality/Emotional Life Deterioration,Mania/Hallucinosis,Taedium Vitae,Disorientation,Social Recession/ Impulse Reduction,Social Life Detorioration,"Chronicle Depression, Intercurrenced",Genetic Influence,"Abusive use of HBP's, Sedatives, Contraception Pills",Unwanted Incident,"Grimaces, Mannerisms, Puerility"
0,Schizophrenia,0,0,0,1,1,0,1,1,0,0,0,1,0,0,0,1,1,1
1,Depressive Disorder,1,1,1,1,0,0,1,1,0,1,0,1,1,0,1,0,0,1
2,Depressive Disorder,0,1,1,1,0,0,0,0,1,0,0,1,1,0,1,1,0,0
3,Manic Depressive Psychosis,1,0,1,0,1,1,1,0,0,0,0,1,1,1,0,1,1,1
4,Depressive Disorder,1,0,0,0,1,0,0,1,1,0,1,1,0,0,0,1,0,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
995,Depressive Disorder,0,0,1,1,1,1,0,1,1,1,1,0,0,1,1,0,0,1
996,Mixed Dementia,1,1,0,0,1,1,0,1,0,0,1,0,1,1,1,1,1,1
997,Mixed Dementia,1,1,1,0,1,0,0,1,1,0,0,1,1,0,0,1,0,1
998,Schizophrenia,0,1,1,1,0,0,0,1,1,1,0,0,0,1,0,0,0,1


In [56]:
#Generate male/female column and add it to dataframe

gender = np.array(['Male', 'Female']) 

gender_list = []
for i in range(1000):
    gender_list.append(random.choice(gender))

gender_list

['Male',
 'Male',
 'Female',
 'Female',
 'Male',
 'Female',
 'Female',
 'Male',
 'Male',
 'Female',
 'Male',
 'Female',
 'Female',
 'Female',
 'Male',
 'Female',
 'Male',
 'Male',
 'Female',
 'Male',
 'Female',
 'Female',
 'Male',
 'Male',
 'Female',
 'Female',
 'Female',
 'Male',
 'Female',
 'Female',
 'Male',
 'Female',
 'Female',
 'Female',
 'Male',
 'Male',
 'Female',
 'Male',
 'Female',
 'Male',
 'Male',
 'Male',
 'Female',
 'Male',
 'Male',
 'Male',
 'Female',
 'Female',
 'Female',
 'Female',
 'Female',
 'Female',
 'Male',
 'Male',
 'Male',
 'Male',
 'Male',
 'Female',
 'Female',
 'Male',
 'Female',
 'Male',
 'Male',
 'Male',
 'Female',
 'Female',
 'Male',
 'Male',
 'Female',
 'Female',
 'Male',
 'Male',
 'Female',
 'Male',
 'Female',
 'Male',
 'Female',
 'Female',
 'Male',
 'Female',
 'Female',
 'Female',
 'Male',
 'Male',
 'Male',
 'Male',
 'Female',
 'Female',
 'Female',
 'Female',
 'Female',
 'Male',
 'Female',
 'Male',
 'Female',
 'Male',
 'Female',
 'Male',
 'Male',
 'Femal

In [59]:
data.insert(loc=1, column='Gender', value=gender_list)
data

Unnamed: 0,Patient Diagnosis,Gender,Behavior Disorders,Recent Birth,Wariness/Memory Reduction,Elevated Stress Level,Scarcity of Phosphate and B12,Inquietude or Anxiety,Toxins in Working Environment,Personality/Emotional Life Deterioration,Mania/Hallucinosis,Taedium Vitae,Disorientation,Social Recession/ Impulse Reduction,Social Life Detorioration,"Chronicle Depression, Intercurrenced",Genetic Influence,"Abusive use of HBP's, Sedatives, Contraception Pills",Unwanted Incident,"Grimaces, Mannerisms, Puerility"
0,Schizophrenia,Male,0,0,0,1,1,0,1,1,0,0,0,1,0,0,0,1,1,1
1,Depressive Disorder,Male,1,1,1,1,0,0,1,1,0,1,0,1,1,0,1,0,0,1
2,Depressive Disorder,Female,0,1,1,1,0,0,0,0,1,0,0,1,1,0,1,1,0,0
3,Manic Depressive Psychosis,Female,1,0,1,0,1,1,1,0,0,0,0,1,1,1,0,1,1,1
4,Depressive Disorder,Male,1,0,0,0,1,0,0,1,1,0,1,1,0,0,0,1,0,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
995,Depressive Disorder,Female,0,0,1,1,1,1,0,1,1,1,1,0,0,1,1,0,0,1
996,Mixed Dementia,Male,1,1,0,0,1,1,0,1,0,0,1,0,1,1,1,1,1,1
997,Mixed Dementia,Male,1,1,1,0,1,0,0,1,1,0,0,1,1,0,0,1,0,1
998,Schizophrenia,Male,0,1,1,1,0,0,0,1,1,1,0,0,0,1,0,0,0,1
