This Ale's For You!
An exploration of the DC/Maryland/Virginia (DMV) beer scene and a beer recommender system
Imagine you're sitting at a bar trying to figure out what to drink next...the bartender is too busy dealing with the mobs of people at the other end of the bar but you could really use some help choosing a beer. As a homebrewer, I'm often the person trying to give suggestions, but can I build a recommender system to save me time and energy?
- Explore the DC/MD/VA (DMV) beer scene
- Build beer recommender(s)
- Content: based on styles and other attributes
- Content: based on textual analysis of reviews
- Collaborative: based on user reviews
- Content: textual and able to accept general queries
- Can we classify good beers? Or beer style categories?
The Universe of Beer Reviews and Beer-Related Databases
There is a bevy of beer review data on the Internet. Some of the main beer review website/applications include:
Additionally, there are more specialized Beer-focused publications that include reviews, typically with a more limited selection of esoteric or limited-release beers:
General brewery databases:
Both RateBeer and Untappd have developer APIs, but both require explicit approval for API keys. Untappd in particular places restrictions on API access for pure research and analytics purposes. Given the turnaround time for API key access, I'm relying heavily on web scraping, which is not explicitly prohibited per each website's Robots Exclusion Protocol page. I have avoided scraping BeerAdvocate given reports of the owner's proclivity towards legal action against prior web scrapers.
Beer Style Guidelines
I also consulted the leading beer style guidelines. At first, I considered maybe this could be a source of features for the model, but given the diversity of beers even within the same style category, the style guides themselves were not helpful except for aiding in manual mapping of beers to broader categories.
As is the adage in data science, data gathering took up a majority of my time on this project to date. Given that the data had to be scraped rather than called via an API, this took a little bit of persistence and creativity. RateBeer's website uses a fair bit of
Conclusions / Further Improvements
Scraping over 100,000 beer reviews was probably what I'm most proud about in this project.
Additionally, I was able to create multiple variations of recommender systems with certain pros/cons:
- Content - style / ABV / review counts etc.
- This gives you very similar beers (i.e. pale lagers with pale lagers)
- Review Content - NLP analysis of review text
- My favorite version though also tilts towards similarity vs. interesting suggestions
- Collaborative - Based on user ratings
- Not terribly interesting; despite pulling a large body of reviews, there are many users that haven't rated a significant number of beers in the dataset
Unfortunately, I've had a bit of a technical snafu at the last minute and in my desperation to get a working demo, some of the organization and functionality stopped working recently. I'm working to fix the basic functionality asap.
Additionally, I'd like to explore further the possibility of using Doc2Vec on the review text to make more general recommendations (not require a starting beer as comparison).
I'd like to deploy the model and dataset as a web app with Flask. Hopefully the web application can also include cool funtionality, including links to the breweries, directions, graph of key words, and more filtering options.
At this stage the beer journey continues ...