# Music genre prediction using Keras


Before we improve our model we first need to learn a new tool that has some more sophisticated tools available. Therefore in this lesson we will rewrite solution used previously using [Keras](https://keras.io/).

Keras is a high-level library for training neural networks. In previous lesson we have written a lot of code to get a good understanding of all the math that is involved in training of our classifier. This time we will do the same with minimal effort.


## Implementation

As usual we start with imports and definition of meta-parameters:

In [1]:
import numpy as np
from keras.models import Sequential
from keras.layers import Dense
from keras.optimizers import SGD

IN_FILE = "./classical_vs_rock.npz"
SAMPLE_WIDTH = 1200

BATCH_SIZE = 16
LEARNING_RATE = 0.01

NUM_EPOCHS = 10

  from ._conv import register_converters as _register_converters
Using TensorFlow backend.


Modules imported from Keras will be explained shortly. The meta-parameters look familiar except for NUM_EPOCHS which replaces number of updates that we used before.

A standard way of performing training is to use all the training data in a loop. One iteration, called an epoch, consists of going through all the training data once (preferably in random order). In previous lesson, for sake of implementation simplicity, we were selecting random audio excerpts to compose each training batch. This meant that some examples were selected multiple times while other might have not been selected at all.

Now, because we are using a high-level interface, we don't have to worry about that. The default behavior of Keras is to use epochs as define before so we will go with that. This is essentially the only difference in the whole procedure compared to previous lesson and, as we shall soon see, it doesn't affect model performance (at least not in a noticeable way).

The evaluation of a model is done by Keras by default after every epoch. Please not that number of epochs and evaluations will match the previous scripts (500 updates with batch size 16 is exactly 8000 - total number of training examples, e.i. one epoch).

Because Keras generates training and test batches by himself, the data preparation stage is greatly simplified. We only have to load the data and prepare it in a form of 4 NumPy arrays: inputs and expected outputs (targets) for training and testing:

In [2]:
data = np.load(IN_FILE)
x_train = data['train']
y_train = np.zeros((x_train.shape[0]))
y_train[int(x_train.shape[0]/2):] = 1.0
x_test = data['test']
y_test = np.zeros((x_test.shape[0]))
y_test[int(x_test.shape[0]/2):] = 1.0

Data preparation got simplified, and so does the model definition. Please not that everything that we were manually defining in previous lesson is not completely hidden from us, we only define type of layer, number of neurons and nonlinearity type.

In [3]:
model = Sequential()
model.add(Dense(256, activation='relu'))
model.add(Dense(64, activation='relu'))
model.add(Dense(1, activation='sigmoid'))

The definition of cost function and parameter updates also becomes trivial. We can even define a metric that we will be using during automatic evaluation of our model.

In [4]:
model.compile(loss='mean_squared_error',
              optimizer=SGD(lr=LEARNING_RATE),
              metrics=['accuracy'])

As you might expect, the whole training loop also boils down to single function call that does everything we need.

In [5]:
history = model.fit(x_train, y_train,
                    batch_size=BATCH_SIZE,
                    epochs=NUM_EPOCHS,
                    verbose=2,
                    validation_data=(x_test, y_test))

Train on 8000 samples, validate on 2000 samples
Epoch 1/10
 - 2s - loss: 0.2378 - acc: 0.6931 - val_loss: 0.2163 - val_acc: 0.8050
Epoch 2/10
 - 1s - loss: 0.1991 - acc: 0.8124 - val_loss: 0.1852 - val_acc: 0.8315
Epoch 3/10
 - 1s - loss: 0.1745 - acc: 0.8202 - val_loss: 0.1650 - val_acc: 0.8345
Epoch 4/10
 - 1s - loss: 0.1589 - acc: 0.8193 - val_loss: 0.1522 - val_acc: 0.8325
Epoch 5/10
 - 1s - loss: 0.1491 - acc: 0.8196 - val_loss: 0.1441 - val_acc: 0.8335
Epoch 6/10
 - 1s - loss: 0.1429 - acc: 0.8193 - val_loss: 0.1388 - val_acc: 0.8350
Epoch 7/10
 - 1s - loss: 0.1386 - acc: 0.8214 - val_loss: 0.1352 - val_acc: 0.8340
Epoch 8/10
 - 1s - loss: 0.1355 - acc: 0.8226 - val_loss: 0.1326 - val_acc: 0.8360
Epoch 9/10
 - 1s - loss: 0.1331 - acc: 0.8244 - val_loss: 0.1309 - val_acc: 0.8315
Epoch 10/10
 - 2s - loss: 0.1313 - acc: 0.8254 - val_loss: 0.1295 - val_acc: 0.8365


Even though all parameters (weights) were hidden from us so far, we can easily access them to deploy our model somewhere else:

In [6]:
for layer_id, layer in enumerate(model.layers):
    weights = layer.get_weights()
    for param_id, param in enumerate(weights):
        print("Layer: {} parameter: {} type: {} shape: {}".format(
                layer_id, param_id, param.dtype, param.shape))

Layer: 0 parameter: 0 type: float32 shape: (1200, 256)
Layer: 0 parameter: 1 type: float32 shape: (256,)
Layer: 1 parameter: 0 type: float32 shape: (256, 64)
Layer: 1 parameter: 1 type: float32 shape: (64,)
Layer: 2 parameter: 0 type: float32 shape: (64, 1)
Layer: 2 parameter: 1 type: float32 shape: (1,)


And that's it. We have basically the same solution as before (I run both scripts multiple times and the two implementations are identical in terms of average performance) but at least we understand everything that happens under the hood. And most importantly we can now update our model with some bells and whistles we were lacing before.

## Rock VS Hip-Hop

At this point I also encourage you to test our solution on a much more challenging task: rock vs hip-hop (you only need to use another input file, link is available in the tutorial's readme file).

You will notice that with this data, performance of current model is disappointingly low: only around 60% accuracy. In all fairness, distinguishing between classical music and rock is not that difficult because classical music rarely utilizes low frequencies and even if it does, the rhythm is usually significantly different from contemporary music's. Confronted with a real challenge our multilayer perceptron failed miserably. It's pretty clear that we should be able to do better and indeed next lesson will teach you how to do it.