# Heart Attack Prediction Model


## Dataset statistics and summary

### Dataset information

-  Age : Age of the patient
-  Sex : Sex of the patient
-  exng: exercise induced angina (1 = yes; 0 = no)
-  caa: number of major vessels (0-3)
-  cp : Chest Pain type chest pain type
    -  Value 1: typical angina
    -  Value 2: atypical angina
    -  Value 3: non-anginal pain
    -  Value 0: asymptomatic
-  trtbps : resting blood pressure (in mm Hg)
-  chol : cholestoral in mg/dl fetched via BMI sensor
-  fbs : (fasting blood sugar > 120 mg/dl) (1 = true; 0 = false)
-  restecg : resting electrocardiographic results
    -  Value 0: normal
    -  Value 1: having ST-T wave abnormality (T wave inversions and/or ST elevation or depression of > 0.05 mV)
    -  Value 2: showing probable or definite left ventricular hypertrophy by Estes' criteria
-  thalachh : maximum heart rate achieved
-  oldpeak : ST depression induced by exercise relative to rest
-  slp  : the slope of the peak exercise ST segment 
    -  Value 0: downsloping
    -  Value 1: flat
    -  Value 2: downsloping
-  thall : 
    -  Value 1: fixed defect 
    -  Value 2: normal
    -  Value 3: reversible defect
-  output : 0= less chance of heart attack 1= more chance of heart attack

### Dataset exploration

In [1]:
import pandas as pd 
import os

heart_df = pd.read_csv(os.path.join(os.getcwd(), "data", "heart.csv"))
heart_df.head()

This dataset has {{heart_df.shape[0]}} observations and {{heart_df.shape[1]}} variables.

In [8]:
heart_df.describe()

Unnamed: 0,age,sex,cp,trtbps,chol,fbs,restecg,thalachh,exng,oldpeak,slp,caa,thall,output
count,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0
mean,54.366337,0.683168,0.966997,131.623762,246.264026,0.148515,0.528053,149.646865,0.326733,1.039604,1.39934,0.729373,2.313531,0.544554
std,9.082101,0.466011,1.032052,17.538143,51.830751,0.356198,0.52586,22.905161,0.469794,1.161075,0.616226,1.022606,0.612277,0.498835
min,29.0,0.0,0.0,94.0,126.0,0.0,0.0,71.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,47.5,0.0,0.0,120.0,211.0,0.0,0.0,133.5,0.0,0.0,1.0,0.0,2.0,0.0
50%,55.0,1.0,1.0,130.0,240.0,0.0,1.0,153.0,0.0,0.8,1.0,0.0,2.0,1.0
75%,61.0,1.0,2.0,140.0,274.5,0.0,1.0,166.0,1.0,1.6,2.0,1.0,3.0,1.0
max,77.0,1.0,3.0,200.0,564.0,1.0,2.0,202.0,1.0,6.2,2.0,4.0,3.0,1.0


Initially while looking at the data at a very high level we can draw some very simple conclusions. 
It appears from this set of data the average male aged around 55 seem to experience a higher chance of suffering from a heart attack. Whilst this is a very simplistic view of the dataset I will be exploring this further with the use of some classifier models to predict whether a patient will have a higher chance of having a heart attack.

There are a wide variety of feature variables in the data which will be used to predict the outcome or the target variable. 
Whilst there are a number of major factors contributing to a heart attack my initial assumptions are that the main feature variables that will be heavily reliant on predicting the outcome will be the following (in no particular order of importance):  
-  chol
-  trtbps
-  oldpeak 

chol represents the cholesteral level in the patient. 

> When you have too much “bad” cholesterol in your body, it can start to pose a problem. It contributes to fatty deposits in the arteries called plaque, which can cause heart disease. 
When that plaque builds up, it makes it harder for blood to flow, and these deposits can eventually break and form a clot that leads to a heart attack. [link to article](https://www.everydayhealth.com/heart-health/cholesterol-and-heart-attack-risk/)


However see some contradictory [link](https://www.healthline.com/health-news/cholesterol-and-heart-attacks)