# Music genre prediction using Theano


In this lesson we will use the skills learned previously and scale it up to sole a real-life problem.

## Challenge

This time we want to solve a challenging classification problem. Given a 3-second single-channel audio excerpt sampled at 400Hz we want to tell if is a piece of classical or rock music.

For the purpose of this exercise I have prepared a dataset that consists of 10k audio excerpts. This is a subset of [Google AudioSet](https://research.google.com/audioset/). The process of gathering the data was simple:
* download YouTube videos that were tagged with particular music genre from the Internet and extract audio stream
* cut the segment of the audio indicated in the segments file (usually 10 seconds)
* cut middle 3 seconds second of that segment and resmaple at 400Hz
This produces audio excerpts that have exactly 1200 samples. Please note that 400Hz sampling rate is quite low and therefore the audio contains only the lowest frequencies (deep, low and mid bass). Additionally the quality of some of the videos is quite poor and some videos are mislabeled which all makes the challenge even more difficult.

The 10k excerpts consists of exactly 5k Classical Music pieces and 5k Rock Music pieces. I have split them into training (80%) and testing (20%) subsets. Therefore training set consists of exactly 4k Classical and 4k Rock pieces and test set contains 1k examples from each class. The dataset is available as a single \*.npz file. The file weights around 48MB, it should be placed in the same directory as this script. The link is provided in the top-level README of this tutorial.

## Approach

To solve the challenge we will use the same approach as in previous lesson, but the function we will be using (aka model) needs to be much bigger. A common choice is to use neural network, or in other words [multilayer perceptron](https://en.wikipedia.org/wiki/Multilayer_perceptron).

## Implementation

In [1]:
import random
import theano
import numpy as np

IN_FILE = "./sample_3s_400Hz.npz"
SAMPLE_WIDTH = 1200

BATCH_SIZE = 16
LEARNING_RATE = 0.01

NUM_UPDATES = 5000
EVAL_STEP = 500

Using cuDNN version 7301 on context None
Mapped name None to device cuda: GeForce GTX 970 (0000:01:00.0)


There are some new meta-parameters of our algorithm, lets describe them here:

Batch size is the number of examples we will be using during single training step (single weight update step). We like to process examples in batches because that stabilizes the learning process, in short it helps the weights to move into the right direction rather than random. Think of this in following way: if weight update improves results for 16 examples at the same time it is far more probable that it will improve results for other examples as well. If we did an update based only on single example, the probability that the update will be random (will not generalize to other examples) is far greater.

Please note that we have reduced the learning rate significantly. Because we will have so many parameters, we want them to be updated slowly and steadily to smooth the possible spikes in gradient that might destabilize the training process.

Because learning rate is much smaller, we have to do much more updates. With this particular number each training example will be used 10 times on average (8000/16 == 500).

To monitor the training process we will do evaluation on test set every 500 updates (ten times during the training process.

First we will wrap data handling into a separate class:

In [2]:
class DataServer(object):
    def __init__(self):
        data = np.load(IN_FILE)
        self.train = data['train'].astype(np.float32)
        self.test = data['test'].astype(np.float32)
        assert(self.train.shape[1] == SAMPLE_WIDTH)

        self.test_targets = np.zeros((self.test.shape[0]), dtype=np.float32)
        self.test_targets[self.test_targets.shape[0]/2:] = 1.0

    def get_test(self):
        return self.test, self.test_targets

    def get_train_sample(self, batch_size):
        inputs = np.empty((batch_size, self.train.shape[1]))
        targets = np.zeros((batch_size))

        for idx in range(batch_size):
            train_idx = random.randrange(self.train.shape[0])
            inputs[idx] = self.train[train_idx]
            if train_idx >= self.train.shape[0]/2:
                targets[idx] = 1.0

        return inputs.astype(np.float32), targets.astype(np.float32)

...WIP