# Heart Disease Prediction

Predicting if the patient has a heart disease based on age, gender, chest pain type, resting blood pressure, maximum heart rate achieved and exercise induced angina

## Path to train and test data

In [1]:
train_data_file_path = './archive/train.csv'
test_data_file_path = './archive/test.csv'

## Reading both archives

In [4]:
import pandas as pd

df_train = pd.read_csv(train_data_file_path)
df_test = pd.read_csv(test_data_file_path)
print(df_train.head())
print(df_test.head())


   age  sex  cp  trestbps  chol  fbs  restecg  thalach  exang  oldpeak  slope  \
0   52    1   0       125   212    0        1      168      0      1.0      2   
1   53    1   0       140   203    1        0      155      1      3.1      0   
2   70    1   0       145   174    0        1      125      1      2.6      0   
3   61    1   0       148   203    0        1      161      0      0.0      2   
4   62    0   0       138   294    1        1      106      0      1.9      1   

   ca  thal  target  
0   2     3       0  
1   0     3       0  
2   0     3       0  
3   1     3       0  
4   3     2       0  
   age  sex  cp  trestbps  chol  fbs  restecg  thalach  exang  oldpeak  slope  \
0   63    1   3       145   233    1        0      150      0      2.3      0   
1   37    1   2       130   250    0        1      187      0      3.5      0   
2   41    0   1       130   204    0        0      172      0      1.4      2   
3   56    1   1       120   236    0        1      178   

## Organizing features and labels

In [9]:
features = ['age', 'sex', 'cp', 'trestbps', 'thalach', 'exang']
labels = ['target']

train_x = df_train[features]
val_x = df_test[features]

train_y = df_train[labels]
val_y = df_test[labels]

print(train_x.head())
print(train_y.head())

   age  sex  cp  trestbps  thalach  exang
0   52    1   0       125      168      0
1   53    1   0       140      155      1
2   70    1   0       145      125      1
3   61    1   0       148      161      0
4   62    0   0       138      106      0
   target
0       0
1       0
2       0
3       0
4       0


## Creating the RandomForestRegressor model

In [19]:
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_absolute_error

# creates the model
hd_model = RandomForestRegressor(random_state=1)

# fits the model
hd_model.fit(train_x, train_y.values.ravel())

# makes predicts with it
hd_predict = hd_model.predict(val_x)

# prints the mean_absolute_error to see how good the model is
print(mean_absolute_error(val_y, hd_predict))

0.012013201320132018


## Comparing results

In [33]:
val_y_list = val_y.values.tolist()

for i in range(len(val_y_list)):
    for j in range(len(val_y_list[i])):
        print('Acutal value: %d \t\t Prediction: %f ' %(val_y_list[i][j], hd_predict[i]))

Acutal value: 1 		 Prediction: 0.980000 
Acutal value: 1 		 Prediction: 1.000000 
Acutal value: 1 		 Prediction: 1.000000 
Acutal value: 1 		 Prediction: 1.000000 
Acutal value: 1 		 Prediction: 0.950000 
Acutal value: 1 		 Prediction: 0.990000 
Acutal value: 1 		 Prediction: 0.990000 
Acutal value: 1 		 Prediction: 1.000000 
Acutal value: 1 		 Prediction: 0.970000 
Acutal value: 1 		 Prediction: 1.000000 
Acutal value: 1 		 Prediction: 0.970000 
Acutal value: 1 		 Prediction: 1.000000 
Acutal value: 1 		 Prediction: 1.000000 
Acutal value: 1 		 Prediction: 1.000000 
Acutal value: 1 		 Prediction: 0.970000 
Acutal value: 1 		 Prediction: 1.000000 
Acutal value: 1 		 Prediction: 0.960000 
Acutal value: 1 		 Prediction: 1.000000 
Acutal value: 1 		 Prediction: 0.940000 
Acutal value: 1 		 Prediction: 1.000000 
Acutal value: 1 		 Prediction: 0.970000 
Acutal value: 1 		 Prediction: 1.000000 
Acutal value: 1 		 Prediction: 0.960000 
Acutal value: 1 		 Prediction: 0.980000 
Acutal value: 1 

# Conclusion
By analyzing the Mean Absolute Error (MAE) and comparing the actual results with the predicted ones, it's noticeable that the model can be pretty accurate and estipulate if the patient has a heart disease or not.