Skip to content
master
Switch branches/tags
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 
 
 
 
 

d

Description

The second project of Spring 2017 Stat 333 is a Kaggle competition, where we are asked to predict Yelp ratings based on the text comments in Madison WI area. Our group got rank one on both public and private leaderboard 🎉.

Models

Model Directory Name Description
Deep Learning ./dl Use Stanford's GloVe to vectorize text, and a simple CP-CP-CP neural network
Linear Regression ./lr Use TFIDF text encoding, and lasso, ridge regression and elastic net
Multiple Linear Regression ./mrl Naive simple multiple linear regression with silly variables
Neural Network ./nn Use tf-idf text encoding, and a simple one hidden layer neural network

Results

Our best model is using Ridge regression with tf-idf text encoding. You can check out the self-explained Jupyter notebook here.

Comments

  1. Feature engineering is much more important in NLP. We have tried many different text encoding methods here. GLoVe should have worked the best, but it was beaten by tf-idf in this very project.
  2. We extracted the stem of words and removed stopping words. It turns out the stopping word level really worths tuning.

You can see our presentation to get more info.

About

Madison restaurant Yelp rating prediction based on review text

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published