## Mortality Outcomes for Females Suffering Myocardial Infarction

![title](mifem.png)

__Myocardial Infarction__ : This is medical name of a heart attack. Heart Attack is caused by regular blockage in oxygen flow to the heart. 

Most heart attacks result from atherosclerosis. The risk factors for heart attack and atherosclerosis are basically the same:

1. An abnormally high level of blood cholesterol (hypercholesterolemia)
2. An abnormally low level of HDL (high-density lipoprotein), commonly called "good cholesterol"
3. High blood pressure (hypertension)
4. Diabetes
5. Family history of coronary artery disease at an early age
6. Cigarette smoking
7. Obesity
8. Physical inactivity (too little regular exercise)

In early middle age, men have a greater risk of heart attack than women. However, a woman's risk increases once she begins menopause. This could be the result of a menopause-related decrease in levels of estrogen, a female sex hormone that may offer some protection against atherosclerosis.

__Description__
> The data frame has 1295 rows and 10 columns. This is the female subset of the 'monica' data frame 

Here, we are going to study/analyse mortality outcomes for females suffering from __Myocardial Infarction__.

Our dataset have following columns:
1. outcome (mortality outcome, a factor with levels live, dead)
2. age (age at onset)
3. yronset (year of onset)
4. premi (previous myocardial infarction event, a factor with levels y, n, nk not known )
5. smstat (smoking status, a factor with levels c current, x ex-smoker, n non-smoker, nk not known)
6. diabetes (a factor with levels y, n, nk not known)
7. highbp (high blood pressure, a factor with levels y, n, nk not known )
8. hichol (high cholesterol, a factor with levels y, n nk not known)
9. angina (a factor with levels y, n, nk not known) --> Type of chest pain caused due to reduced blood flow to the heart
10. stroke (a factor with levels y, n, nk not known)

__Source__
> Newcastle (Australia) centre of the Monica project; see the web site http://www.ktl.fi/monica 


In [1]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

  return f(*args, **kwds)
  return f(*args, **kwds)
  return f(*args, **kwds)
  return f(*args, **kwds)
  return f(*args, **kwds)


In [65]:
data = pd.read_csv('DataSets//mifem.csv')

In [66]:
data = data.drop(labels='Unnamed: 0',axis=1)

In [67]:
data_live = data[data['outcome']=='live']
data_dead = data[data['outcome']=='dead']
print('Number of live people in our data are {} '.format(len(data_live)))
print('Number of dead people in our data are {} '.format(len(data_dead)))

Number of live people in our data are 974 
Number of dead people in our data are 321 


In [68]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1295 entries, 0 to 1294
Data columns (total 10 columns):
outcome     1295 non-null object
age         1295 non-null int64
yronset     1295 non-null int64
premi       1295 non-null object
smstat      1295 non-null object
diabetes    1295 non-null object
highbp      1295 non-null object
hichol      1295 non-null object
angina      1295 non-null object
stroke      1295 non-null object
dtypes: int64(2), object(8)
memory usage: 101.2+ KB


In [69]:
print(data.describe())
print(data_live.describe())
print(data_dead.describe())

               age      yronset
count  1295.000000  1295.000000
mean     60.922008    88.785328
std       7.042327     2.553647
min      35.000000    85.000000
25%      57.000000    87.000000
50%      63.000000    89.000000
75%      66.000000    91.000000
max      69.000000    93.000000
              age     yronset
count  974.000000  974.000000
mean    60.413758   88.912731
std      7.266492    2.552950
min     35.000000   85.000000
25%     56.250000   87.000000
50%     62.000000   89.000000
75%     66.000000   91.000000
max     69.000000   93.000000
              age     yronset
count  321.000000  321.000000
mean    62.464174   88.398754
std      6.068319    2.520516
min     38.000000   85.000000
25%     60.000000   86.000000
50%     64.000000   88.000000
75%     67.000000   90.000000
max     69.000000   93.000000


In [70]:
data_live.groupby('yronset').mean()

Unnamed: 0_level_0,age
yronset,Unnamed: 1_level_1
85,60.919192
86,59.93578
87,60.574468
88,60.831683
89,59.420168
90,59.716981
91,60.785714
92,60.835294
93,60.905172


In [71]:
cat_vars = ['outcome','premi','smstat','diabetes','highbp','hichol','angina','stroke']
for var in cat_vars:
    cat_list='var'+'_'+var
    cat_list = pd.get_dummies(data[var], prefix=var)
    data1=data.join(cat_list)
    data = data1

In [79]:
cols = data.columns.values.tolist()

In [84]:
to_keep = [i for i in cols if i not in cat_vars]

In [81]:
data_final = data[to_keep]

In [85]:
data_final

Unnamed: 0,age,yronset,outcome_dead,outcome_live,premi_n,premi_nk,premi_y,smstat_c,smstat_n,smstat_nk,...,highbp_y,hichol_n,hichol_nk,hichol_y,angina_n,angina_nk,angina_y,stroke_n,stroke_nk,stroke_y
0,63,85,0,1,1,0,0,0,0,0,...,1,0,0,1,1,0,0,1,0,0
1,55,85,0,1,1,0,0,1,0,0,...,1,0,0,1,1,0,0,1,0,0
2,68,85,0,1,0,0,1,0,0,1,...,1,0,1,0,0,0,1,1,0,0
3,64,85,0,1,1,0,0,0,0,0,...,1,1,0,0,0,0,1,1,0,0
4,67,85,1,0,1,0,0,0,0,1,...,0,0,1,0,0,1,0,0,1,0
5,66,85,0,1,1,0,0,0,0,0,...,0,0,1,0,0,1,0,0,1,0
6,63,85,0,1,1,0,0,0,1,0,...,1,1,0,0,1,0,0,1,0,0
7,68,85,1,0,0,0,1,0,1,0,...,1,0,0,1,0,0,1,0,0,1
8,46,85,1,0,1,0,0,1,0,0,...,1,0,1,0,0,1,0,1,0,0
9,66,85,1,0,0,0,1,1,0,0,...,1,1,0,0,1,0,0,0,0,1
