Building a pipeline in python to clean and pre-process song lyrics to perform analysis. Eminem's lyrics were scrapped from lyricsGenius, a Python library by Johnwmillr on December 22, 2020.
- WordCloud (EminemWC.png)
- Text Analysis
- Sentiment Analysis
- lyricsgenius - Library for scrapping data from Genius.com
- Pandas - Library for Dataframe manipulation
- Json - Json processing library
- Numpy - Library used for large, high-level mathematics
- langdetect - Language detection library
- NLTK - Natural Language processing ToolKit
- PIL - Image processing library
- Wordcloud - Library used for making wordclouds
- Matplotlib - Library used for visualizations
You can clone the whole repo and run the .ipynb files on Jupyter Notebook or Google Collab. Comments and detailed explaination of the code are in these files as well.
- Extract.ipynb : Code used to scrape Eminem's lyrics
- TransformLoad.ipynb : Code for data cleaning and pre-processing
- Test.db: DataBase to store data from TransformLoad.ipynb
- NLTKAnalysis.ipynb : Code for WordCloud and NLP analysis
NOTE : The resulting files when running the above files will be saved in the working directory. However, all the output files are stored in files folder here for organizing reasons.