Welcome to the Hypertension Analysis Repository! 🩸

SC1015 mini project

IMPORTANT!!!

Paste Github notebook link in https://nbviewer.org/ to see full notebook. Some charts only work there and are not shown on GitHub since GitHub only shows static images. Our notebook includes HTML/JavaScript embeddings, and Github cannot display such cells properly! Alternatively, just click THIS LINK HERE!

About

This is a Mini-Project for SC1015 (Introduction to Data Science and Artificial Intelligence). We use the Stroke Dataset from Kaggle.

The order of our notebook is:

Introduction
Exploratary Data Analysis
Data Balancing
Modeling
Final Thoughts

(Links to jump around the notebook will be provided in the notebook itself to minimise scrolling. However it only works on https://nbviewer.org/ !)

Contributors

@tengyaolong2000 Teng Yao Long
@jewel-chin Jewel Chin
@yuminp Park Yumin

Problem Definition

What are the main predictors of Hypertension?
Which model would be the best to predict Hypertension?

Models Used

Logistic Regression
Decision Tree
Random Forest
Support Vector Machine
Artificial Neural Network
eXtreme Gradient Boosting Classifier
K Nearest Neighbours
Naive Bayes Classifier

Conclusion

Age and BMI unanimously are the biggest predictors of hypertension.
Other predictors include average glucose level and heart disease (It's important to exercise!!! 🏃‍♂️🏃‍♀️).
Tree models are good are predicting hypertension if we focus only on recall. However the scores of other metrics are sacrificed too much.
Logistic Regression and Naive Bayes models have decent recall without sacrificing other scores too much. These models are good if we have limited resources (GPU/ memory).
If we have sufficient resources, the Neural Network has the potential to be the best after more hyperparameter tuning/ increase in model complexity
However we also need to deal with overfitting.
If we were to use a Deep Learning approach, we could also utilise transfer learning/ ensemble modeling.

What did we learn from this project?

Handling imbalanced datasets using resampling methods and imblearn package (SMOTE)
Feature selection/ feature importance techniques (RFE, SHAP, Permutation importance)
Logistic Regression with sklearn
Random Forest with sklearn
Support Vector Machines with sklearn
Aritficial Neural Networks with TensorFlow Keras
XGBoost with xgboost
K Nearest Neighbours with sklearn
Naive Bayes Classifier with sklearn
Collaborating using GitHub
Data visualisation with plotly
Grid Search to determine best hyperparameters
Concepts on different metrics such as Recall, F1 score

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
.ipynb_checkpoints		.ipynb_checkpoints
HeaderImages		HeaderImages
Misc		Misc
.DS_Store		.DS_Store
README.md		README.md
healthcare-dataset-stroke-data.csv		healthcare-dataset-stroke-data.csv
hypert_pred.ipynb		hypert_pred.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Welcome to the Hypertension Analysis Repository! 🩸

SC1015 mini project

IMPORTANT!!!

About

Contributors

Problem Definition

Models Used

Conclusion

What did we learn from this project?

References

About

Releases

Packages

Contributors 2

Languages

tengyaolong2000/SC1015_MiniProj

Folders and files

Latest commit

History

Repository files navigation

Welcome to the Hypertension Analysis Repository! 🩸

SC1015 mini project

IMPORTANT!!!

About

Contributors

Problem Definition

Models Used

Conclusion

What did we learn from this project?

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages