Deep Learning on Music Information Retrieval Tutorial
Switch branches/tags
Nothing to show
Clone or download
audiofeature audiofeature
audiofeature and audiofeature added class file
Latest commit 6def2c6 Oct 4, 2017

Tutorial: Deep Learning on Music Information Retrieval

(c) 2017 by Thomas Lidy, TU Wien -

This is a set of tutorials showing how to use Deep learning algorithms for music analysis and retrieval problems. More specifically, we use Convolutional Neural Networks to classify (categorize) music into genres.

It uses Python 2.7 as the programming language with the popular Keras and Theano Deep Learning libraries underneath.


For the tutorials, we use iPython / Jupyter notebook, which allows to program and execute Python code interactively in the browser.

Viewing Only

If you do not want to install anything, you can simply view the tutorials' content in your browser, by clicking on the tutorial's filenames listed below in the GIT file listing (above, resp. on ).

The tutorial will open in your browser for viewing.

Interactive Coding

If you want to follow the Tutorials by actually executing the code on your computer, please install first the pre-requisites as described below.

After that, to run the tutorials go into the DL_MIR_Tutorial folder and start from the command line:

ipython notebook or jupyter notebook

Your web browser will open showing a list of files. Start the tutorials one after another by clicking on the following:

This tutorial shows how music is categorized into 1 of 10 music genres using the GTZAN music collection (see below). It includes audio and data preprocessing for Deep Learning and creating and training different architectures and parameters of a Convolutional Neural Network. It also includes techniques such as Batch Normalization, ReLU Activation and Dropout.

Installation of Pre-requisites

Install Python 2.7

Note: On most Mac and Linux systems Python is already pre-installed. Check with python --version on the command line whether you have Python 2.7.x installed.

Otherwise install Python 2.7 from

Install Python libraries:

Mac, Linux or Windows

(on Windows leave out sudo)

sudo pip install ipython

Try if you can open

ipython notebook

on the command line. Otherwise try to install:

sudo pip install jupyter

Then download or clone the Tutorials from this GIT repository:

git clone

or download
unzip it and rename the folder to DL_MIR_Tutorial.

Install the remaining Python libraries needed:

Either by:

sudo pip install Keras==1.2.1 Theano==0.8.2 scikit-learn>=0.17 pandas librosa

or, if you downloaded or cloned this repository, by:

cd DL_MIR_Tutorial
sudo pip install -r requirements.txt

Install MP3 Decoder

If you want to use audio formats other than .wav files (e.g. .mp3, .flac, .au, .mp4), you have to install FFMPEG on you computer:

Make sure that the exectuable is in a PATH found by the system.

Configure Keras to use Theano

Since we use Theano as the Deep Learning computation backend, but Keras is configured to use TensorFlow by default, we have to change this in the keras.json configuration file, which is in the .keras folder of the user's HOME directory.

Copy the keras.json included in the DL_MIR_Tutorial to one of the following target directories (you can overwrite an existing file):

  • Windows: C:\Users\<user>\.keras\
  • Mac: /Users/<user>/.keras
  • Linux: /home/<user>/.keras

An alternantive is to change these 2 lines in your keras.json file to the following:

    "image_dim_ordering": "th",
    "backend": "theano"

See for details or for a step by step guide.

Optional for GPU computation

If you want to train your neural networks on your GPU, also install the following (not needed for the tutorials):

To permanently configure Keras/Theano to use the GPU place a file .theanorc in your home directory with the following content:

device = gpu
floatX = float32

Check if installed correctly

To check whether Python, Keras and Theano were installed correctly, do:


If everything is installed correctly, it should print Using Theano backend.
If the GPU is configured correctly, it should also print Using gpu device 0: GeForce GTX 980 Ti or similar.

Source Credits

Python libraries

The following helper Python libraries are used in these tutorials:

  • and by Thomas Lidy and Alexander Schindler, taken from the RP_extract git repository
  • by Warren Weckesser

Data Sources

The data sets we use in the tutorials are from the following sources:

  • GTZAN music genre data set: by George Tzanetakis 1000 audio files with 30 sec. each, across 10 music genres, 100 audio files each

  • GTZAN music speech data set: (currently not used) by George Tzanetakis Collected for the purposes of music/speech discrimination. 128 tracks, each 30 seconds long. Each class (music or speech) has 64 examples in 22050Hz Mono 16-bit WAV audio format.

both data sets available from: