GitHub - yi-ye-zhi-qiu/liamometer: Flask WebApp runs a 5-fold CV Ridge Linear regression on web-scraped IMDB ratings

The Liam-o-meter 🥭

Linear regression analysis of IMDb Movie Ratings as an interpretive model for audience score based on genre & distributor

Metis data-science bootcamp project 2, Jan. 11-22 2021

Summary: FlaskApp of movies rated using a k- (5-) fold cross-validation lasso multiple linear regression model of IMDB scores (n=2316) scraped off the web, where we chose to use three features: genre (Action, Adventure, etc.), genre-genre interactions (Horror Thriller, for example), and movie distributor (Disney, Paramount, Other defined as <=5 movies/year, etc.) to analyze timeframe. This model is interpretive and the use case is: "Can we train a critic to think about movie ratings as if they are a fan of small, internationally-successful studios?"

Contributors:

Liam
Liam's Dad (helped run code after I got IP-banned from RottenTomatoes)

Requirements to run locally:

The scrapy spider & data analysis:

Python 3.6 or greater
jupyter notebook
scrapy (pip3 install scrapy)
scikit-learn==v0.22.0<=0.23.2 (necessary for yellowbrick utils._safe_indexing dependency)
other modules: pandas matplotlib seaborn numpy json regex fuzzywuzzy pprint yellowbrick
~8 hours of time start to finish
Strong willingness to get IP-banned from RottenTomatoes (just for a few days)

The web-scraping:

For a tutorial on web-scraping using Scrapy you can see my blogpost here.

The WebApp:

The FlaskApp is running on Ubuntu on an AWS AmazonLightsail server.

Note: we do not focus here on or include the code for deployment of a FlaskApp onto AWS, let alone the html/css/javascript used to display the data. This is because the FlaskApp is a part of my personal portfolio, and including all of the code for that here seems tangential to the point at hand: web scraping and linear regression. If you are interested, to view the code used to create the app see here.

How to run locally:

follow directions in spider.py to change a few lines of code to match your local path, not mine

In your terminal:

cd boxoffice_scrapy
scrapy crawl mojo_spider -L WARN
scrapy crawl tomato_spider -L WARN
scrapy crawl imdb_spider -L WARN
scrapy crawl heirloom_spider -L WARN
scrapy crawl metacritic_spider -L WARN

Output

in "boxoffice_scrapy": heirloom.csv, imdb.csv, metacritic.csv, mojo.csv, tomatoes.csv

How it works

After webscraping, you can follow along using the five notebooks. They are:

Step I: EDA (exploring review sources)
Step II: Data cleaning (removing MPAA Rating, Budget)
Step III: Data modeling (one-hot encoding genre, genre-genre interactions, distributor) in the form of linear regression and degree 2 polynomial regression; metric used: R^2
Step IV: Comparing our regularized and non-regularized linear and polynomial models via residual plots and Q-Q plots
Step V: Making an html dataframe to use in a flask webapp (see liamometer.py, app.py and /templates)

Name		Name	Last commit message	Last commit date
Latest commit History 106 Commits
boxoffice_scrapy		boxoffice_scrapy
data		data
images		images
models		models
static		static
templates		templates
README.md		README.md
StepI EDA.ipynb		StepI EDA.ipynb
StepII Data cleaning.ipynb		StepII Data cleaning.ipynb
StepIII Data modeling.ipynb		StepIII Data modeling.ipynb
StepIV Ridge Lasso CV.ipynb		StepIV Ridge Lasso CV.ipynb
StepV html.ipynb		StepV html.ipynb
app.py		app.py
final_presentation.pdf		final_presentation.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The Liam-o-meter 🥭

Linear regression analysis of IMDb Movie Ratings as an interpretive model for audience score based on genre & distributor

How it works

About

Releases

Packages

Contributors 2

Languages

yi-ye-zhi-qiu/liamometer

Folders and files

Latest commit

History

Repository files navigation

The Liam-o-meter 🥭

Linear regression analysis of IMDb Movie Ratings as an interpretive model for audience score based on genre & distributor

How it works

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages