Skip to content

tosmartak/Data-Science-and-Machine-Learning-Projects-with-Python

Repository files navigation

My Portfolio of Data Science and Machine Learning Projects Completed Using Python

Below is the list and summary of all the data science projects I have completed using the python programming language. Each of the project is linked to its own respository for further details and to deep dive into the content and the codes.

  • Coffee Disease Image Classification: Collaborated with a team of Data Scientist and AI engineers on Omdena AI platform – Ethiopian Chapter, to classify coffee diseases based on 5 different image classes. I leverage Densenet transfer learning to develop a robust coffee disease image classification system, with the aim of optimizing the precision and efficiency of disease detection for coffee farmers.

  • Predicting Autism in Toddlers: Collaborated with a team of Data Scientist and AI engineers on Omdena AI platform – Sri Lankan Chapter, to predict autism in toddlers on using a chat (Natural Language) dataset and another Tabular Dataset. Extracted data from the chats using ‘pylangacq’ library, use different methods to generate synthetic data for imbalanced classification, conduct EDA on each dataset to check the differences in their distribution, and modeled each data using Random Forest Classifier to evaluate their performance.

  • Nigeria Interstate Relocation Guide - Price Prediction and Recommendation Engine: Scraped data from a Nigerian house listing website, cleaned and explored the data to generate insight on housing apartment rent across Nigerian States. Built a regression model to predict house prices with a 0.75 R2 score on test data, and a custom search function to recommend best areas in a State to relocate based on one’s selected criteria.

  • Dog Vision: built an end-to-end multi-class image classifier using TensorFlow 2.8.2 and TensorFlow Hub with the aim of identifying the breed of a dog given the image of the dog. Used data from the Kaggle dog breed identification competition which consists of a collection of 10,000+ labelled images of 120 different dog breeds. Evaluated on Multi Class Log Loss between the predicted probability and the observed target with 98% accuracy.

  • Heart Disease Prediction: Built an end-to-end classification machine learning model capable of predicting whether or not someone has heart disease based on their medical attributes. Experimentation still in progress, but currently with a baseline accuracy score of 88.5% and 5 cross validation accuracy score of 84% on the test data. The aim is to get at least 95% accuracy score after several experimentations with different models and hyperparameter tuning.

  • Bulldozer Sale Price Prediction: Built an end-to-end regression machine learning model with the aim of predicting the future sale price of a bulldozer, given its characteristics and previous examples of how much similar bulldozers have been sold for. Evaluated with RMSLE (root mean squared log error) of 0.24 and R^2 score of 0.88 on the validation data.

Releases

No releases published

Packages

No packages published

Languages