Skip to content

tae-jun/resemul

Repository files navigation

Sample-level CNN Architectures for Music Auto-tagging Using Raw Waveforms

A TensorFlow+Keras implementation of "Sample-level CNN Architectures for Music Auto-tagging Using Raw Waveforms" including Jupyter note for excitation analysis

icassp2018poster-small

Table of contents

Citation

@inproceedings{kim2018sample,
  title={Sample-level CNN Architectures for Music Auto-tagging Using Raw Waveforms},
  author={Kim, Taejun and Lee, Jongpil and Nam, Juhan},
  booktitle={International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  year={2018},
  organization={IEEE}
}

Prerequisites

  • Python 3.5 and the required packages
  • ffmpeg (required for madmom)

Installing required Python packages

pip install -r requirements.txt
pip install madmom

The madmom package has a install-time dependency, so should be installed after installing packages in requirements.txt.

This will install the required packages:

Installing ffmpeg

ffmpeg is required for madmom.

MacOS (with Homebrew):

brew install ffmpeg

Ubuntu:

add-apt-repository ppa:mc3man/trusty-media
apt-get update
apt-get dist-upgrade
apt-get install ffmpeg

CentOS:

yum install epel-release
rpm --import http://li.nux.ro/download/nux/RPM-GPG-KEY-nux.ro
rpm -Uvh http://li.nux.ro/download/nux/dextop/el ... noarch.rpm
yum install ffmpeg

Preparing MagnaTagATune (MTT) dataset

Download audio data and tag annotations from here. Then you should see 3 .zip files and 1 .csv file:

mp3.zip.001
mp3.zip.002
mp3.zip.003
annotations_final.csv

To unzip the .zip files, merge and unzip them (referenced here):

cat mp3.zip.* > mp3_all.zip
unzip mp3_all.zip

You should see 16 directories named 0 to f. Typically, 0 ~ b are used to training, c to validation, and d ~ f to test.

To make your life easier, make a directory named dataset, and place them in the directory as below:

mkdir dataset

Your directory structure should be like:

dataset
├── annotations_final.csv
└── mp3
    ├── 0
    ├── 1
    ├── ...
    └── f

Now, the MTT dataset preparation is Done!

Preprocessing the MTT dataset

This section describes a required preprocessing task for the MTT dataset. Note that this requires 48G storage space. It uses multiprocessing.

These are what the preprocessing does:

  • Select top 50 tags in annotations_final.csv
  • Split dataset into training, validation, and test sets
  • Resample the raw audio files to 22050Hz
  • Segment the resampled audios into 59049 sample length
  • Convert the segments to TFRecord format

To run the preprocessing:

python build_mtt.py

After the preprocessing your dataset directory should be like:

dataset
├── annotations_final.csv
├── mp3
│   ├── 0
│   ├── ...
│   └── f
└── tfrecord
    ├── test-0000-of-0043.seq.tfrecord
    ├── ...
    ├── test-0042-of-0043.seq.tfrecord
    ├── train-0000-of-0152.tfrecord
    ├── ...
    ├── train-0151-of-0152.tfrecord
    ├── val-0000-of-0015.tfrecord
    ├── ...
    └── val-0014-of-0015.tfrecord

18 directories, 211 files

Training a model from scratch

To train a model from scratch, run the code:

python train.py

The trained model and logs will be saved under the directory log.

Training a model with options

train.py trains a model using SE block by default. To see configurable options, run python train.py -h. Then you will see:

usage: train.py [-h] [--data-dir PATH] [--train-dir PATH]
                [--block {se,rese,res,basic}] [--no-multi] [--alpha A]
                [--batch-size N] [--momentum M] [--lr LR] [--lr-decay DC]
                [--dropout DO] [--weight-decay WD] [--initial-stage N]
                [--patience N] [--num-lr-decays N] [--num-audios-per-shard N]
                [--num-segments-per-audio N] [--num-read-threads N]

Sample-level CNN Architectures for Music Auto-tagging.

optional arguments:
  -h, --help            show this help message and exit
  --data-dir PATH
  --train-dir PATH      Directory where to write event logs and checkpoints.
  --block {se,rese,res,basic}
                        Block to build a model: {se|rese|res|basic} (default:
                        se).
  --no-multi            Disables multi-level feature aggregation.
  --alpha A             Amplifying ratio of SE block.
  --batch-size N        Mini-batch size.
  --momentum M          Momentum for SGD.
  --lr LR               Learning rate.
  --lr-decay DC         Learning rate decay rate.
  --dropout DO          Dropout rate.
  --weight-decay WD     Weight decay.
  --initial-stage N     Stage to start training.
  --patience N          Stop training stage after #patiences.
  --num-lr-decays N     Number of learning rate decays.
  --num-audios-per-shard N
                        Number of audios per shard.
  --num-segments-per-audio N
                        Number of segments per audio.
  --num-read-threads N  Number of TFRecord readers.

For example, if you want to train a model without multi-feature aggregation using Res block:

python train.py --block res --no-multi

Downloading pre-trained models

You can download the two best models of the paper:

  • SE+multi (AUC 0.9111) [download]: a model using SE blocks and multi-feature aggregation
  • ReSE+multi (AUC 0.9113) [download]: a model using ReSE blocks and multi-feature aggregation

To download them from command line:

# SE+multi
curl -L -o se-multi-auc_0.9111-tfrmodel.hdf5 https://www.dropbox.com/s/r8qlxbol2p4ods5/se-multi-auc_0.9111-tfrmodel.hdf5?dl=1

# ReSE+multi
curl -L -o rese-multi-auc_0.9113-tfrmodel.hdf5 https://www.dropbox.com/s/fr3y1o3hyha0n2m/rese-multi-auc_0.9113-tfrmodel.hdf5?dl=1

Evaluating a model

To evaluate a model run:

python eval.py <MODEL_PATH>

For example, if you want to evaluate the downloaded SE+multi model:

python eval.py se-multi-auc_0.9111-tfrmodel.hdf5

Excitation Analysis

  • If you just want to see codes and plots, please open excitation_analysis.ipynb.
  • If you want to analyze excitations by yourself, please follow next steps.

1. Extracting excitations.

python extract_excitations.py <MODEL_PATH>

# For example, to extract excitations from the downloaded `SE+multi` model:
python extract_excitations.py se-multi-auc_0.9111-tfrmodel.hdf5

This will extract excitations from the model and save them as a Pandas DataFrame. The saved file name is excitations.pkl by default.

2. Analyze the extracted excitations.

Run Jupyter notebook:

jupyter notebook

And open the note excitation_analysis.ipynb in Jupyter notebook. Run and explore excitations by yourself.

Issues & Questions

If you have any issues or questions, please post it on issues so that other people can share it :) Thanks!

About

A TensorFlow+Keras implementation of "Sample-level CNN Architectures for Music Auto-tagging Using Raw Waveforms"

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published