Skip to content

tengyaolong2000/SC1015_MiniProj

Repository files navigation

Welcome to the Hypertension Analysis Repository! 🩸

SC1015 mini project

IMPORTANT!!!

Paste Github notebook link in https://nbviewer.org/ to see full notebook. Some charts only work there and are not shown on GitHub since GitHub only shows static images. Our notebook includes HTML/JavaScript embeddings, and Github cannot display such cells properly! Alternatively, just click THIS LINK HERE!

About

This is a Mini-Project for SC1015 (Introduction to Data Science and Artificial Intelligence). We use the Stroke Dataset from Kaggle.

The order of our notebook is:

  1. Introduction
  2. Exploratary Data Analysis
  3. Data Balancing
  4. Modeling
  5. Final Thoughts

(Links to jump around the notebook will be provided in the notebook itself to minimise scrolling. However it only works on https://nbviewer.org/ !)

Contributors

  • @tengyaolong2000 Teng Yao Long
  • @jewel-chin Jewel Chin
  • @yuminp Park Yumin

Problem Definition

  • What are the main predictors of Hypertension?
  • Which model would be the best to predict Hypertension?

Models Used

  1. Logistic Regression
  2. Decision Tree
  3. Random Forest
  4. Support Vector Machine
  5. Artificial Neural Network
  6. eXtreme Gradient Boosting Classifier
  7. K Nearest Neighbours
  8. Naive Bayes Classifier

Conclusion

  • Age and BMI unanimously are the biggest predictors of hypertension.
  • Other predictors include average glucose level and heart disease (It's important to exercise!!! 🏃‍♂️🏃‍♀️).
  • Tree models are good are predicting hypertension if we focus only on recall. However the scores of other metrics are sacrificed too much.
  • Logistic Regression and Naive Bayes models have decent recall without sacrificing other scores too much. These models are good if we have limited resources (GPU/ memory).
  • If we have sufficient resources, the Neural Network has the potential to be the best after more hyperparameter tuning/ increase in model complexity
  • However we also need to deal with overfitting.
  • If we were to use a Deep Learning approach, we could also utilise transfer learning/ ensemble modeling.

What did we learn from this project?

  1. Handling imbalanced datasets using resampling methods and imblearn package (SMOTE)
  2. Feature selection/ feature importance techniques (RFE, SHAP, Permutation importance)
  3. Logistic Regression with sklearn
  4. Random Forest with sklearn
  5. Support Vector Machines with sklearn
  6. Aritficial Neural Networks with TensorFlow Keras
  7. XGBoost with xgboost
  8. K Nearest Neighbours with sklearn
  9. Naive Bayes Classifier with sklearn
  10. Collaborating using GitHub
  11. Data visualisation with plotly
  12. Grid Search to determine best hyperparameters
  13. Concepts on different metrics such as Recall, F1 score

References

About

SC1015 mini project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published