Skip to content

tboonhau/Clustering-Recipes

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Project: Clustering Recipes

Motivation

This is my first Data Science project. After several months of self-taught lessons and tutorials from various platforms, I have attempted to build my portfolio of projects to gain my entry into the world of Data Science professionals. There will be a lot mistakes but being able to start has definitely help to build some confidence that it is possible to achieve what one wants to

Description

A dataset consisting of 12190 recipe in German which was scraped from the web is used to determine which group they belong to according to the ingredients. Since there is no label provided, unsupervised learning methods will be used. Initially, data will be cleaned and preprocessed before using Natural Language Processing(NLP) technique via Tf-idf vectorization to obtain importance of each ingredient accross all observations in the dataset. Thereafter, dimensions will be reduced from a higher dimension after vectorization to a lower dimension for K-Means Clustering and also t-SNE plot to visualize the clusters. Interactive plotting via plotly to better visualize the plots.

Note that, I have very little knowledge in the language and the recipes. However, the outcome may be interesting.

Image