Skip to content

Topic model analysis of Berkshire Hathaway annual letters (Completed Capstone Project #2)

License

Notifications You must be signed in to change notification settings

tomhalloin/Springboard-Berkshire

Repository files navigation

Springboard Capstone Project II: Topic Model Analysis of Berkshire Hathaway's Shareholder Letters

This project is an analysis of Berkshire Hathaway's annual letters using Natural Language Processing with Python. Approaches included three types of extractive summarization: LexRank, TextRank, and Latent Semantic Analysis, as well as topic modeling using the Mallet wrapper from Gensim and a Java version of Mallet LDA.

If you plan to run this code, make sure to set the file locations and shortcuts for Mallet to your respective files on your computer, as otherwise, the code will not run on your computer. I would recommend not running the notebook to scrape the letters and just using the letters that come with it instead because usually, Berkshire's website denies me access from scraping multiple letters at once.

Also note that the final topics change from run to run, even with the same random seed. The final topic distributions between the notebook and the report differ slightly.

Notebook to Scrape the Letters

Notebook for Everything Else

Final Writeup

Final Presentation

About

Topic model analysis of Berkshire Hathaway annual letters (Completed Capstone Project #2)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published