Skip to content

Latest commit

 

History

History
13 lines (7 loc) · 1.34 KB

README.md

File metadata and controls

13 lines (7 loc) · 1.34 KB

Project: Clustering Recipes

Motivation

This is my first Data Science project. After several months of self-taught lessons and tutorials from various platforms, I have attempted to build my portfolio of projects to gain my entry into the world of Data Science professionals. There will be a lot mistakes but being able to start has definitely help to build some confidence that it is possible to achieve what one wants to

Description

A dataset consisting of 12190 recipe in German which was scraped from the web is used to determine which group they belong to according to the ingredients. Since there is no label provided, unsupervised learning methods will be used. Initially, data will be cleaned and preprocessed before using Natural Language Processing(NLP) technique via Tf-idf vectorization to obtain importance of each ingredient accross all observations in the dataset. Thereafter, dimensions will be reduced from a higher dimension after vectorization to a lower dimension for K-Means Clustering and also t-SNE plot to visualize the clusters. Interactive plotting via plotly to better visualize the plots.

Note that, I have very little knowledge in the language and the recipes. However, the outcome may be interesting.

Image