Lyric-based Music Classification
How to use
- Scrape the data: run
- Preprocess the data: copy the folders generated and run
python3 pool.py [folder] [tokenizer]where tokenizer is either "jumanpp" or "janome". Janome is recommended, since it Juman++ is still flaky and quite slow. You need to have janome and Juman++, as well as
pyknp, Juman++'s Python interface, installed.
- To train neural network: run
python3 train.py training_config.json. Again, you need to copy the generated folders over. This trains quite quickly if you have a GPU (even a home-use GPU). 100 epochs takes just a few minutes, although it seems that >= 500 epochs give better results. You will see evaluation results at the end. You need TensorFlow, NumPy and Pandas. You also need a word2vec embedding for Japanese; you can find instructions on how to train one here.
- To train naive Bayes classifier: run
python3 classify.py. You will see evaluation results at the end. This is very fast, just a few seconds. You need SciKit-Learn and nothing else.