Skip to content
Switch branches/tags

Latest commit


Git stats


Failed to load latest commit information.

W266 Group Project: Lyric Mood Classification

UC Berkeley Masters of Information & Data Science

W266 Natural Language Processing with Deep Learning Group Project

Team: Cyprian Gascoigne, Jack Workman, Yuchen Zhang

Table of Contents

Project Proposal
Python Environment Setup
Project Procedure & Walkthrough
The Lyric Mood Classification Pipeline

Project Proposal

Live Session Instructor: Daniel Cer Group Members: Jack Workman, Yuchen Zhang, Cyprian Gascoigne

We plan to compare the accuracy of deep learning vs traditional machine learning approaches for classifying the mood of songs, using lyrics as features. We will use mood categories derived from Russell’s model of affect (from psychology, where mood is represented by vector in 2D valence-arousal space) and also calculate a valence-happiness rating. Mood categories will likely be happiness, anger, fear, sadness, and love (perhaps surprise, disgust).

We will test the extensibility of our deep learning model through genre classification, song quality prediction / album ratings, and additional text features such as part-of-speech tags, number of unique and repeated words and lines, lines ending with same words, etc.

Classifying mood (or “sentiment”) using textual features has been studied less than musical features. One reason may be in obtaining a large dataset legally (as lyrics are copyrighted material). The dataset we will use is the Million Song Dataset (MSD), a freely available million contemporary music track dataset. From MSD, we will use the and musiXmatch datasets for song tags and lyrics. We will also use language detection to focus on English lyrics.

Algorithms we are considering in addition to RNN can be naive bayes, KNN, binary SVM, n-gram models like topK.

Previous work reached contradictory conclusion and employed smaller datasets and simpler methodology like td-idf weighted bag-of-words to derive unigrams, fuzzy clustering, and linear regression. Our proposed model approach (RNN) should have better accuracy.

Mood classification of lyrics can help in the creation of automatic playlists, a music search engine, labeling for digital music libraries, and other recommendation systems.

Paper References:

  • Bandyopadhyay, Sivaji, Das, Dipankar, & Patra, Braja Gopal. (2015). Mood Classification of Hindi Songs based on Lyrics.
  • Becker, Maria, Frank, Anette, Nastase, Vivi, Palmer, Alexis, and Staniek, Michael. (2017). Classifying Semantic Clause Types: Modeling Context and Genre Characteristics with Recurrent Neural Networks and Attention.
  • Corona, Humberto & O’Mahony, Michael. (2015). An Exploration of Mood Classification in the Million Songs Dataset.
  • Danforth, Christopher M. & Dodds, Peter Sheridan. (2009). Measuring the Happiness of Large-Scale Written Expression: Songs, Blogs, and Presidents
  • Fell, Michael & Sporleder, Caroline. (2014). Lyrics-based Analysis and Classification of Music.
  • Lee, Won-Sook & Yang, Dan. (2010). Music Emotion Identification from Lyrics
  • Mihalcea, Rada & Strapparava, Carlo. (2012). Lyrics, Music, and Emotions.

Update: We used a Convolutional Neural Network (CNN) as our mood classification model with word2vec generated embeddings instead of a Recurrent Neural Network (or its variants) because of the size of the input. We had hoped to acquire the song lyrics labeled with moods line-by-line, but we had to adjust our plans when we discovered that the song-to-mood mapping is one-to-one in the MSD+LastFM dataset as we were advised it would prove impractical to train an RNN with such long input sequences. We baselined our classifier with most-common-case, Naive Bayes (NB), and Support Vector Machines (SVM) models and spent the majority of our time tuning our CNN's hyperparameters and the word2vec embeddings along with plenty of effort invested in grooming and cleaning the data.


For our project, we made use of the Million Song Dataset (MSD) and its companion datasets from and MusixMatch. We also scraped and downloaded lyrics from the internet as MusiXmatch only provides lyrics in a bag-of-words format and we needed the sequential ordering of the lyrics.

Here are the datasets:

To unzip the .bz2 files in the data dir, use tar xvjf <file.tar.bz2> (source)


Model Unbalanced Mood Unbalanced Mood Quadrants Balanced Mood Quadrants
MCC 39.81% 43.61% 25.34%
NB 39.93% 46.78% 55.19%
SVM 44.88% 50.95% 54.07%
CNN w2v0 56.79% 62.15% 77.08%
CNN w2v1 54.33% 63.53% 75.45%

Our most performant mood classifier was our CNN w2v0 model with our Balanced Mood Quadrants dataset. Our accuracy of 77.08% is higher than other known lyric-based classifiers at the time of this writing.

For a detailed discussion of our results and review of available literature, please take a look at our paper: lyric-mood-classification-with-deep-learning.pdf

To learn more about this repository and how to reproduce our findings, please continue reading.

Python Environment Setup

First, make sure you have Python 3.6+ installed.

Then, run through the following commands:

  • python -m venv .venv_w266_project
  • Windows: .venv_w266_project\Scripts\activate.bat
  • Linux: source .venv_w266_project/bin/activate
  • pip install -r requirements.txt - this will install all required packages and might take several minutes

Jupyter Notebooks

Before interacting with Jupyter Notebooks in this repo, please first run the setup_jupyter.bat script. This script installs this repo's virtualenv as a kernel available to jupyter. Then, when using a notebook, click on Kernel -> Change Kernel -> .venv_w266_project to begin using our virtualenv's python and its packages.

To set up jupyter notebook on Ubuntu, use this guide.

Project Procedure & Walkthrough

The following sections seek to explain and provide more information on each of the steps taken to building our dataset and classifier. You can treat these sections like a guide and follow along to reproduce our work.

Please note that Step 1 can take quite a while. Downloading the original datasets and scraping for lyrics took us several days with 5+ computers. The scripts come with the option to only download or process lyrics for all artists beginning with a specific letter (for example, 'a'). Use that option to speed things up but be warned that your classifier's accuracy will likely be much lower due to the smaller dataset.


  1. Data Downloading
  2. Scraping Lyrics
  3. Indexing Lyrics
  4. Labeling Lyrics
  5. Word Embeddings
  6. Mood Classification

Recommended: Skip steps 2 and 3 by decompressing data/labeled_lyrics.tar.bz2.

And don't forget to review the Appendix for some useful tips and helpful resources.

Data Downloading

The original MSD and companion datasets are quite large. Too large, in fact, to be stored in the github repo. To download the data, please run script

Run python

For more information, please see

This will download the data into the data directory. This will take several minutes.

NOTE: Only the sqlite db and the musixmatch matches file are required. The rest of the downloadable via download_data sources are optional and made available in case one wishes to explore the data further.

Scraping Lyrics

Important: We use the python package lyricsgenius for retrieving lyrics. The package interfaces with the api for lyric access. In order to use the package, you'll need to create an account and get an api token. This requires providing an "app name" and "app url" to genius. Once you've done so, save your api key to data/api.txt.

To start our project, we attempt to download lyrics for all of the songs in MusiXmatch with use of the lyricsgenius python package. For each song, we try all combinations of the MSD song title, MSD artist name, MXM song title, and MXM artist name until we get a successful download. For many songs, no lyrics were found.

Run python

For more information, please see

The output of this stage is the directories data/lyrics/json and data/lyrics/txt populated with files containing just lyrics (if .txt) and lyrics plus additional metadata (if .json).

USEFUL TIP: Run python -t a & python -t b to run in parallel (for artists starting with letter a or b). use fg to switch between processes so you can quit with ^C.

Indexing Lyrics

After scraping and downloading lyrics into txt files, we next index the files and perform basic checks on the validity of each. The checks include:

  1. Are the lyrics in English?
  2. Does a downloaded lyric text file exist?
  3. What is the total word count?

Run python

For more information, please see script

The output of this stage is a csv (commonly referred to as indexed_lyrics.csv) with track id, track name, track artist, path to track lyrics file in repo, and additional metadata. A pregenerated version of this csv is available at data/indexed_lyrics.tar.bz2.

Labeling Lyrics

Once we have a nice index built, we match the lyrics to the mood tags from the dataset. To do this, we iterate over each row of the index, query the sqlite database for all associated tags, then attempt to match tags against our mood categories.

Our mood categories come in two different forms: the original mood categories (the MOOD_CATEGORIES dict in and the expanded mood categories (viewable at mood_categories_expanded.json). When using the expanded categories, substring matching and subsequent filtering is used to match moods. We explain this process more in-depth in our paper.

Run python --expanded-moods

For more information, please see script

The output of this stage is a csv (commonly referred to as labeled_lyrics.csv) very similar to indexed_lyrics.csv but with an additional column: mood. A pregenerated version of this csv is available at data/labeled_lyrics.tar.bz2.

Word Embeddings

As input to our deep learning CNN classifier, we make use of the word2vec model as defined by Mikolov et al and the implementation provided by TensorFlow to generate our word embeddings.

The script contains a lyrics2vec python class that saves its embeddings and data as python pickle files. The pickle files can be reused later by classifiers as needed.

An example of the lyrics2vec implementation can be seen in the's main function as well as the word_embeddings.ipynb notebook.

The lyrics2vec class is later reused in our mood classification workflow to generate embeddings and pass them to our classifier.

Additionally, the lyrics2vec embeddings and associated data are saved in logs/tf/lyrics2vec_expanded with directories lyrics2vec_V-10000_Wt-1 where 10000 is your vocab size and Wt-1 is your word tokenizer (see word_tokenizers dict in

Mood Classification

For the final act, we build and train a Convolutional Neural Network to predict the moods of the songs we labeled with the lyrics we downloaded. Our CNN is modeled after Yoon Kim's CNN for Sentence Classification with help from Denny Britz's useful CNN walkthrough.

For model implementation details, please see

When executing the model, models, summaries, and other outputs are saved in the logs/tf directory. They are identified by a unique key generated by the model parameters and an additional name provided by you. For example, a model with embedding_size=128, filter_sizes=[3, 4, 5], num_filters=128, dropout=0.5, L2=0.01, batch_size=64, and num_epochs=10 will be saved in directory logs/tf/runs/Em-128_FS-3-4-5_NF-128_D-0.5_L2-0.01_B-64_Ep-10/.

The Lyric Mood Classification Pipeline

Since much of the fun in NLP and Deep Learning comes from fiddling with and manipulating your data, we've constructed a configurable pipeline that consists of

  • Importing, filtering, and preprocessing the lyrics
  • Training word embeddings with lyrics2vec
  • Vectorizing and splitting the dataset
  • Training the CNN

This pipeline is available via With one command, you can generate word embeddings and train a CNN model on our lyrics dataset! The configuration options are numerous. Please review the script's documentation for details.

Run python

Note that you will first need to scrape, index, and label the lyrics (see: Project Procedure & Walkthrough).

Our pipeline also has the capability to auto-activate tensorboard during model training. Watch the logs for the tensorboard url (or try http://your_ip:6006/).


Reviewing lyrics2vec Results

Within your lyrics2vec model's output directory, you will find model checkpoints, pickled data for reuse, and an embeddings.png file (if the plotting function is used). This png is a t-SNE representation of your embeddings and can be useful (and insightful!) when reviewing your embeddings.

Reviewing CNN Results

We provide several means to review the output of a trained model.

First, you can use TensorFlow's tensorboard. The script can autogenerate a tensorboard command for you, or you can build your own with the following:

tensorboard --logdir logs/tf/runs/<model>/summaries/

To compare multiple models in tensorboard try:

tensorboard --logdir <name1>:logs/tf/runs/<model1>/summaries,<name2>:logs/tf/runs/<model2>/summaries

Note that saves model summaries and model checkpoints and that is what tensorboard uses to generate its visualizations.

Second, you can view outputted step_data.csv generated during training. It contains a row-by-row log of each train, dev, and test step along with the timestamp, the step id, the loss, and the accuracy.

Example: examples/step_data.csv.

Third, you can view the model's confusion matrix. These are generated according to your evaluate_every value (default: 100) and look like 200_confusion.csv for step 200. For enhanced viewing, use the visualizations notebook which generates a visually appealing and color-coded confusion matrix with seaborn.

Useful Links

Python code for interacting with lastfm sqlite db
Python code for interacting with musicmatch lyrics
Scraping song lyrics from


UC Berkeley Masters of Information & Data Science | W266 Natural Language Processing with Deep Learning Group Project | Team: Cyprian Gascoigne, Jack Workman, Yuchen Zhang



No releases published


No packages published