Skip to content

yura-ueno/portfolio

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 

Repository files navigation

Welcome to My Portfolio!


I am a ...

  • MSc. student in Data Science at the University of Gothenburg
    with a strong passion for leveraging data and analytics for problem-solving and decision-making 🔥
  • Financial Analyst Intern at Apple Japan 🍎
  • Ex. Data Science Intern at Spotify 🎶 💚
  • Ex. Data Science Intern at Johnson & Johnson 🏥
  • Ex. Data Analyst at Nagase Brothers Inc. 📚

Technical Skills:

  • Programming: Python (4.5+ years), SQL (1.5+ years)
  • Machine Learning
  • Statistical Analysis & Modeling
  • Hypothesis Testing incl. AB test
  • Data Visualization
  • Data Wrangling

Experienced Tools:

  • Libraries: Pandas, NumPy, Matplotlib, Seaborn, Scikit-learn, Scipy, PyTorch, Langchain
  • BI tools: Tableau, Looker Studio, Dataiku
  • Database: BigQuery, Dremio
  • Others: Git

Projects

Description:
In this project, I run hypothesis tests for evaluating the effectiveness of new features in products and power analysis for estimating the sample size required for running experiments. The notebook contains the analysis along with the functions for running the hypothesis tests.

Tests used:
two-sample t-test, paired t-test, power analysis

Description:
I have implemented the Kmeans algorithm from scratch in Python and tested the algorithm for image compressions. The implemented Kmeans follows the kmeans++ algorithm for optimized initialization of centroids and leverages vectorized computations with NumPy matrices for more efficient calculations.

Keywords:
kmeans++, vectorized computations, clustering, image compression

Description:
I implemented and compared the performance of various neural network models including a convolutional neural network model for digit classification. I also implemented an auto-encoder for denoising the images of digits and experimented using the decoder part of the auto-encoder for generating synthetic "handwritten" digits.

Keywords:
neural network, convolutional network, auto-encoder, image classification, generating synthetic images.

Description:
I built a logistic regression classifier that predicts whether a patient has cancer or not based on an image of a fine needle aspirate of a breast mass. The notebook contains feature pre-processing and feature selection processes before the model training as well.

Keywords:
logistic regression, classification, feature preprocessing, feature selection, evaluation metric selection, confusion matrix

Description:
I compare Kmeans clustering and DBSCAN (Density-based spatial clustering of applications with noise) through a protein conformation cluster analysis. I also showcase an example of data adjustment required for more reasonable clusterization.

Keywords:
Kmeans, DBSCAN, clustering, data adjustment

Description:
This project shows how you can compare the performances of different models by running a paired t-test to see which model performs statistically significantly better. I exemplify this by comparing the performance of a logistic regression classifier and a Gaussian Naive Bayes classifier on an example data set using a Paired T-test.

Keywords:
paired t-test, model comparison, Gaussian Naive Bayes classifier, logistic regression

Description:
This project

  • compares the decision tree classification and the random forest classification in terms of overfitting and underfitting,
  • looks into changes in the results as the ensemble size in the random forest classifier grows, and
  • evaluates the feature importance in decision tree classifiers and random forest classifiers.

Keywords:
decision tree, random forest, ensemble model, feature importance

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published