"Working with Corpora" final project - lyrics-based Japanese music classification.
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.


Lyric-based Music Classification

How to use

  1. Scrape the data: run python3 scrape_lyrics.py
  2. Preprocess the data: copy the folders generated and run python3 pool.py [folder] [tokenizer] where tokenizer is either "jumanpp" or "janome". Janome is recommended, since it Juman++ is still flaky and quite slow. You need to have janome and Juman++, as well as pyknp, Juman++'s Python interface, installed.
  3. To train neural network: run python3 train.py training_config.json. Again, you need to copy the generated folders over. This trains quite quickly if you have a GPU (even a home-use GPU). 100 epochs takes just a few minutes, although it seems that >= 500 epochs give better results. You will see evaluation results at the end. You need TensorFlow, NumPy and Pandas. You also need a word2vec embedding for Japanese; you can find instructions on how to train one here.
  4. To train naive Bayes classifier: run python3 classify.py. You will see evaluation results at the end. This is very fast, just a few seconds. You need SciKit-Learn and nothing else.