Welcome to the Hypertension Analysis Repository! 🩸

SC1015 mini project

IMPORTANT!!!

Paste Github notebook link in https://nbviewer.org/ to see full notebook. Some charts only work there and are not shown on GitHub since GitHub only shows static images. Our notebook includes HTML/JavaScript embeddings, and Github cannot display such cells properly! Alternatively, just click THIS LINK HERE!

About

This is a Mini-Project for SC1015 (Introduction to Data Science and Artificial Intelligence). We use the Stroke Dataset from Kaggle.

The order of our notebook is:

Introduction
Exploratary Data Analysis
Data Balancing
Modeling
Final Thoughts

(Links to jump around the notebook will be provided in the notebook itself to minimise scrolling. However it only works on https://nbviewer.org/ !)

Contributors

@tengyaolong2000 Teng Yao Long
@jewel-chin Jewel Chin
@yuminp Park Yumin

Problem Definition

What are the main predictors of Hypertension?
Which model would be the best to predict Hypertension?

Models Used

Logistic Regression
Decision Tree
Random Forest
Support Vector Machine
Artificial Neural Network
eXtreme Gradient Boosting Classifier
K Nearest Neighbours
Naive Bayes Classifier

Conclusion

Age and BMI unanimously are the biggest predictors of hypertension.
Other predictors include average glucose level and heart disease (It's important to exercise!!! 🏃‍♂️🏃‍♀️).
Tree models are good are predicting hypertension if we focus only on recall. However the scores of other metrics are sacrificed too much.
Logistic Regression and Naive Bayes models have decent recall without sacrificing other scores too much. These models are good if we have limited resources (GPU/ memory).
If we have sufficient resources, the Neural Network has the potential to be the best after more hyperparameter tuning/ increase in model complexity
However we also need to deal with overfitting.
If we were to use a Deep Learning approach, we could also utilise transfer learning/ ensemble modeling.

What did we learn from this project?

Handling imbalanced datasets using resampling methods and imblearn package (SMOTE)
Feature selection/ feature importance techniques (RFE, SHAP, Permutation importance)
Logistic Regression with sklearn
Random Forest with sklearn
Support Vector Machines with sklearn
Aritficial Neural Networks with TensorFlow Keras
XGBoost with xgboost
K Nearest Neighbours with sklearn
Naive Bayes Classifier with sklearn
Collaborating using GitHub
Data visualisation with plotly
Grid Search to determine best hyperparameters
Concepts on different metrics such as Recall, F1 score

References

https://en.wikipedia.org/wiki/Artificial_neural_network
https://en.wikipedia.org/wiki/Naive_Bayes_classifier
http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html
https://keras.io/
https://plotly.com/python/
https://en.wikipedia.org/wiki/Logistic_regression
https://en.wikipedia.org/wiki/Random_forest
https://scikit-learn.org/stable/modules/svm.html
https://en.wikipedia.org/wiki/Support-vector_machine
https://towardsdatascience.com/machine-learning-basics-with-the-k-nearest-neighbors-algorithm-6a6e71d01761
https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm
https://en.wikipedia.org/wiki/Naive_Bayes_classifier
https://scikit-learn.org/stable/modules/naive_bayes.html
https://en.wikipedia.org/wiki/XGBoost
https://machinelearningmastery.com/gentle-introduction-xgboost-applied-machine-learning/
https://machinelearningmastery.com/introduction-to-regularization-to-reduce-overfitting-and-improve-generalization-error/
https://machinelearningmastery.com/rfe-feature-selection-in-python/
https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.RFE.html
https://shap.readthedocs.io/en/latest/index.html
https://machinelearningmastery.com/smote-oversampling-for-imbalanced-classification/
https://scikit-learn.org/stable/modules/permutation_importance.html#:~:text=The%20permutation%20feature%20importance%20is,model%20depends%20on%20the%20feature.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Welcome to the Hypertension Analysis Repository! 🩸

SC1015 mini project

IMPORTANT!!!

About

Contributors

Problem Definition

Models Used

Conclusion

What did we learn from this project?

References

Files

README.md

Latest commit

History

README.md

File metadata and controls

Welcome to the Hypertension Analysis Repository! 🩸

SC1015 mini project

IMPORTANT!!!

About

Contributors

Problem Definition

Models Used

Conclusion

What did we learn from this project?

References