-
Notifications
You must be signed in to change notification settings - Fork 0
yxing999/Yelp-Data-Mining
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Yelp Data mining Data source: Yelp review API: Yelp Fusion API, Google Cloud Natural Language API Goals and Outline: The goal of our project is to apply clustering methods to answer the following questions that we think worthwhile to explore: 1. What neighborhoods in a city have the best cuisine selection according to the ratings in Yelp dataset? 2. Does some neighborhoods in a city provide more expansive cuisines than the others? 2. In the review of a restaurant, which kind of keywords are more likely to be mentioned? If a restaurant wants to improve its reviews, what should they focus on? This project report takes the example of reviews about "Pizza" in Boston. Three main components were done to solve the above questions. 1.Drawing of locations of all restaurants serves pizza in Boston shown in Google Map with Google API. 2.Geographical based clustering. Use K-means and GMM clustering algorithm to draw the clustering figure of restaurants. This shows the ratings and price distributions of all the restaurants. Silhouette Score of each clustering is calculated. 3.Classify the reviews via sentiment analysis. Make use of Google Cloud Natural Language API to divide review into positive and negative part. 4.Find and plot the relationship between topics and keywords. Use LDA(Latent Dirichlet Allocation) to process the corpus, visualize the result by PyLDAVis and Wordclouds. Environment requirements: - Download and install Anaconda, install Jupyter via Anaconda. - install the following packages: lda gensim PyLDAVis wordcloud bokeh sklearn TextBlob google.cloud Running instruction: To run the whole application, you should: 1. run main.ipynb in the root folder this will show a) the plotting of all the restaurants on Google Map b) the K-Means and GMM clustering result for rating and price, 4 plotting in total c) the reviews crwaled and saved to LDA/reviews.csv d) key words via wordclouds. 3. run sentiment_analysis.ipynb, this will show a) the result of sentiment Analysis of positive and negative revirew and the result of the review process. 4. go into the directory of this project and then cd LDA in terminal( cd [path]/LDA) and then use command: python LDA.py this will show a) the topics extracted from the corpus, with top frequent keywords on its right when click on topics, the right panel will show how much the keywords count
About
No description, website, or topics provided.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published