Click on the links to find out more!
-
- Conducted analysis of DVD rental store database, writing PostgreSQL queries within Jupyter environment and visualised query outputs using Matplotlib and Seaborn.
- Identified opportunities and provided actionable insights to improve inventory and pricing strategies.
-
HDB Resale Price Predictor Application
- Built and deployed a predictive model on streamlit.io framework for easy access by relevant stakeholders in mind.
- Achieved a test RMSE of approximately $39600 using XGBoost, which outperformed LinearRegression, RandomForestRegressor, and ensemble modelling methods.
- Utilized permutation importance analysis on the final model and identified the top 5 features that significantly influence the model’s performance and HDB resale price.
- Proposed future work includes automating the process with an ML pipeline that extracts data from data.gov's HDB resale price API and updates the model accordingly.
-
Death Row Last Statement Topic Modeling
- Utilized BeautifulSoup to scrape and collect data from 475 inmates’ last statements.
- Conducted a data analysis on the last statements of Death Row inmates, where I utilized natural language processing libraries like NLTK to preprocess the data before conducting topic modelling.
- Explored various clustering techniques with BERTopic to identify distinct topics within the last statements and successfully discovered 5 distinct topics.
-
Capstone Project: Human Emotion Recognition
- Project aims to minimise revenue loss through poor customer experiences by building a facial emotion recognition model that detects when a customer is feeling angry, allowing for prompt service recovery and preventing negative customer reviews.
- Personally collected and carefully curated a dataset comprising approximately 700 images of myself, featuring 4 distinct facial expressions. The objective of this dataset is to train a multi-label classifier with high recall for classifying my own angry facial expressions.
- Applied haar cascade face detection to identify and isolate faces in the images, and utilized OpenCV to crop the faces for noise reduction.
- Performed image augmentation techniques to increase the size of the training dataset and improve the model's ability to generalize.
- Constructed a 2-layer CNN model using TensorFlow, incorporating regularization techniques, KerasTuner, and three callbacks to enhance model performance and generalizability.
- Achieved a recall score of 100% for angry faces on the final model iteration.
-
- Employed SMOTE to balance the class distribution and then fine-tuned the models to minimize the precision-recall tradeoff.
- Constructed an ensemble model by combining KNN and RandomForestClassifier via soft voting, and adjusted the weights to improve performance. This approach outperformed using a single model alone.
-
- Designed and developed a dashboard on Tableau to monitor book sales, showcasing data visualization and dashboard creation skills, as well as an ability to deliver actionable insights to stakeholders.
- Proposed 3 recommendations to increase revenue based on the exploratory data analysis.
-
- Project 1: Data Analysis of SAT and ACT
- Project 2: Predicting Housing Prices in Ames, Iowa
- Project 3: Web APIs & Natural Language Processing
- Project 4: Predicting West Nile Virus