I am a ...
- MSc. student in Data Science at the University of Gothenburg
with a strong passion for leveraging data and analytics for problem-solving and decision-making 🔥 - Financial Analyst Intern at Apple Japan 🍎
- Ex. Data Science Intern at Spotify 🎶 💚
- Ex. Data Science Intern at Johnson & Johnson 🏥
- Ex. Data Analyst at Nagase Brothers Inc. 📚
Technical Skills:
- Programming: Python (4.5+ years), SQL (1.5+ years)
- Machine Learning
- Statistical Analysis & Modeling
- Hypothesis Testing incl. AB test
- Data Visualization
- Data Wrangling
Experienced Tools:
- Libraries: Pandas, NumPy, Matplotlib, Seaborn, Scikit-learn, Scipy, PyTorch, Langchain
- BI tools: Tableau, Looker Studio, Dataiku
- Database: BigQuery, Dremio
- Others: Git
Description:
In this project, I run hypothesis tests for evaluating the effectiveness of new features in products and power analysis for estimating the sample size required for running experiments. The notebook contains the analysis along with the functions for running the hypothesis tests.
Tests used:
two-sample t-test, paired t-test, power analysis
Description:
I have implemented the Kmeans algorithm from scratch in Python and tested the algorithm for image compressions. The implemented Kmeans follows the kmeans++ algorithm for optimized initialization of centroids and leverages vectorized computations with NumPy matrices for more efficient calculations.
Keywords:
kmeans++, vectorized computations, clustering, image compression
Description:
I implemented and compared the performance of various neural network models including a convolutional neural network model for digit classification. I also implemented an auto-encoder for denoising the images of digits and experimented using the decoder part of the auto-encoder for generating synthetic "handwritten" digits.
Keywords:
neural network, convolutional network, auto-encoder, image classification, generating synthetic images.
Description:
I built a logistic regression classifier that predicts whether a patient has cancer or not based on an image of a fine needle aspirate of a breast mass. The notebook contains feature pre-processing and feature selection processes before the model training as well.
Keywords:
logistic regression, classification, feature preprocessing, feature selection, evaluation metric selection, confusion matrix
Description:
I compare Kmeans clustering and DBSCAN (Density-based spatial clustering of applications with noise) through a protein conformation cluster analysis. I also showcase an example of data adjustment required for more reasonable clusterization.
Keywords:
Kmeans, DBSCAN, clustering, data adjustment
Description:
This project shows how you can compare the performances of different models by running a paired t-test to see which model performs statistically significantly better. I exemplify this by comparing the performance of a logistic regression classifier and a Gaussian Naive Bayes classifier on an example data set using a Paired T-test.
Keywords:
paired t-test, model comparison, Gaussian Naive Bayes classifier, logistic regression
Description:
This project
- compares the decision tree classification and the random forest classification in terms of overfitting and underfitting,
- looks into changes in the results as the ensemble size in the random forest classifier grows, and
- evaluates the feature importance in decision tree classifiers and random forest classifiers.
Keywords:
decision tree, random forest, ensemble model, feature importance