# Music Genre Classification with Deep Learning

This tutorial shows how different Convolutional Neural Network architectures are used for the taks of music genre classification.

The data set used is the [GTZAN](http://marsyasweb.appspot.com/download/data_sets/) genre data set compiled by George Tzanetakis. It consists of 1000 tracks (30 second excerpts) from 10 genres, each with 100 examples.

The original tracks are 22050Hz Mono 16-bit audio files in .au format.

For a more compact download we provide a version in .mp3 format, also 22050 Hz.

This tutorial contains:
* Loading and Preprocessing of Audio files
* Loading class files from CSV and using Label Encoder
* Generating Mel spectrograms
* Standardization of Data
* Convolutional Neural Networks: single, stacked, parallel
* ReLU Activation
* Dropout
* Train/Test set split
* (Cross-validation - TODO)

You can execute the following code blocks by pressing SHIFT+Enter consecutively.

In [19]:
import os

# choosing between CPU and GPU
#device = 'cpu' # 'cpu' or 'gpu'
#os.environ['THEANO_FLAGS']='mode=FAST_RUN,device=' + device + ',floatX=float32'

import argparse
import csv
import datetime
import glob
import math
import sys
import time
import numpy as np
import pandas as pd # Pandas for reading CSV files and easier Data handling in preparation
from os.path import join

from theano import config

import keras
from keras.models import Sequential, Model
from keras.layers import Input, Convolution2D, MaxPooling2D, Dense, Dropout, Activation, Flatten, merge
from keras.layers.normalization import BatchNormalization

# local
import rp_extract as rp
from audiofile_read import audiofile_read

from sklearn import preprocessing
from sklearn.metrics import accuracy_score
from sklearn import __version__ as sklearn_version

if sklearn_version.startswith('0.17'):
    from sklearn.cross_validation import train_test_split
    from sklearn.cross_validation import StratifiedShuffleSplit
else: # >= 0.18
    from sklearn.model_selection import train_test_split
    from sklearn.model_selection import StratifiedShuffleSplit

## Set the Path to the Dataset

adjust this path to where the data set is stored on your computer:


In [1]:
# SET YOUR OWN PATH HERE
AUDIO_PATH = '../data/GTZAN_mp3'

## Load the Metadata

The tab-separated file contains pairs of filename TAB class category (i.e. genre).

In [5]:
csv_file = join(AUDIO_PATH,'filelist_GTZAN_mp3_wclasses.txt')
metadata = pd.read_csv(csv_file, index_col=0, header=None)
metadata.head(10)

Unnamed: 0_level_0,1
0,Unnamed: 1_level_1
./rock/rock.00053.mp3,rock
./rock/rock.00051.mp3,rock
./rock/rock.00076.mp3,rock
./rock/rock.00084.mp3,rock
./rock/rock.00052.mp3,rock
./rock/rock.00057.mp3,rock
./rock/rock.00028.mp3,rock
./rock/rock.00035.mp3,rock
./rock/rock.00095.mp3,rock
./rock/rock.00088.mp3,rock


In [6]:
# create list of filenames with associated classes
filelist = metadata.index.tolist()
classes = metadata[1].values.tolist()

## Encode Labels to Numbers

String labels need to be encoded as numbers. We use the LabelEncoder from the scikit-learn package.

In [10]:
classes[0:5]

['rock', 'rock', 'rock', 'rock', 'rock']

In [11]:
classes[99:105]

['rock', 'hiphop', 'hiphop', 'hiphop', 'hiphop', 'hiphop']

In [123]:
from sklearn.preprocessing import LabelEncoder

labelencoder = LabelEncoder()
labelencoder.fit(classes)

# we keep (and print) the number of classis
n_classes = len(labelencoder.classes_)
print n_classes, "classes:", ", ".join(list(labelencoder.classes_))

classes_num = labelencoder.transform(classes)

10 classes: blues, classical, country, disco, hiphop, jazz, metal, pop, reggae, rock


We check how the classes look like now numerically:

In [107]:
classes_num[0:5]

array([9, 9, 9, 9, 9])

In [108]:
classes_num[99:105]

array([9, 4, 4, 4, 4, 4])

Note: In order to correctly re-transform any predicted numbers into strings, we keep the labelencoder for later.

In [109]:
from sklearn.preprocessing import OneHotEncoder

# make a row vector a column vector, as needed by OneHotEncoder, using reshape(-1,1) 
classes_num_col = classes_num.reshape(-1, 1)

encoder = OneHotEncoder(sparse=False)
classes_num_1hot = encoder.fit_transform(classes_num_col)
classes_num_1hot

array([[ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.],
       ..., 
       [ 0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  1.,  0.,  0.,  0.,  0., 

In [110]:
classes_num_1hot.shape

(1000, 10)

## Load the Audio Files

In [20]:

list_spectrograms = [] # spectrograms are put into a list first

# desired output parameters
n_mel_bands = 40   # y axis
frames = 80        # x axis

# some FFT parameters
fft_window_size=512
fft_overlap = 0.5
hop_size = int(fft_window_size*(1-fft_overlap))
segment_size = fft_window_size + (frames-1) * hop_size # segment size for desired # frames

for filename in filelist:
    print ".", 
    filepath = os.path.join(AUDIO_PATH, filename)
    samplerate, samplewidth, wavedata = audiofile_read(filepath,verbose=False)
    sample_length = wavedata.shape[0]

    # make Mono (in case of multiple channels / stereo)
    if wavedata.ndim > 1:
        wavedata = np.mean(wavedata, 1)
        
    # take only a segment
    pos = 0 # start position
    wav_segment = wavedata[pos:pos+segment_size]

    # 1) FFT spectrogram 
    spectrogram = rp.calc_spectrogram(wav_segment,fft_window_size,fft_overlap)

    # 2) Transform to perceptual Mel scale (uses librosa.filters.mel)
    spectrogram = rp.transform2mel(spectrogram,samplerate,fft_window_size,n_mel_bands)
        
    # 3) Log 10 transform
    spectrogram = np.log10(spectrogram)
    
    list_spectrograms.append(spectrogram)
        
print "\nRead", len(filelist), "audio files"

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 

In [21]:
len(list_spectrograms)

1000

In [22]:
spectrogram.shape

(40, 80)

In [23]:
print "An audio segment is", round(float(segment_size) / samplerate, 2), "seconds long"

An audio segment is 0.94 seconds long


Note: For simplicity of this tutorial, here we load only 1 single segment of ~ 1 second length from each audio file.
In a real setting, one would create training instances of as many audio segments as possible to be fed to a Neural Network.

In [37]:
spectrogram[0:5,0:5]

array([[        -inf,         -inf, -16.42545363, -13.22244195, -12.26833528],
       [        -inf,         -inf, -16.22262668, -13.01742469, -11.62220539],
       [        -inf,         -inf, -16.03619263, -13.01102944, -11.58396978],
       [        -inf,         -inf, -15.88867874, -13.34190845, -11.91520938],
       [        -inf,         -inf, -15.77491789, -13.82499558, -11.90392084]])

In [81]:
# TODO plot spectrogram

## Make 1 big array of list of spectrograms

In [24]:
# a list of many 40x80 spectrograms is made into 1 big array
# config.floatX is from Theano configration to enforce float32 precision (needed for GPU computation)
data = np.array(list_spectrograms, dtype=config.floatX)
data.shape

(1000, 40, 80)

In [50]:
# check for Inf values

# np.log10(spectrogram) will produce -inf if a spectrogram value is 0. we replace -inf by 0 here

if np.any(np.isinf(data)):
    print "Warning: Data contains inf values. Replacying by 0."
    data[np.isinf(data)] = 0

## Standardization

<b>Always normalize / standardize</b> the data before feeding it into the Neural Network!

We use <b>attribute-wise standardization</b>, i.e. each feature (i.e. 'pixel' in the spectrogram) is standardized individually, as opposed to computing a single mean and single standard deviation of all values.

(Instead of 'attribute-wise, also 'flat' standardization would also be possible,computing the mean and standard deviation across all pixels).

One possibility is 'Min-Max normalization', i.e. scaling the values between 0 and 1.

Here we use <b>Zero-mean Unit-variance standardization</b> (also known as Z-score normalization).

We use the StandardScaler from the scikit-learn package for our purpose, which performs a Zero-mean Unit-variance standardization.

In [51]:
# Scalers and normalizers work on vectors. So we have to transfor the matrix of our spectrograms into vector data
# ('vectorize' or 'reshape' them).

# vectorize
N, ydim, xdim = data.shape
data = data.reshape(N, xdim*ydim)
data.shape

ValueError: need more than 2 values to unpack

In [52]:
data[0:1].shape

(1, 3200)

In [53]:
# standardize
scaler = preprocessing.StandardScaler()
data = scaler.fit_transform(data)

Now all the values are transformed into the 0-mean 1-variance space.

In [58]:
np.mean(data, axis=0)

array([ 0.        ,  0.        ,  0.00003291, -0.00000348, -0.00000135, -0.00000497, -0.0000019 ,  0.00000086, -0.00000034, -0.000001  , ..., -0.00000007,  0.00000116, -0.00000242, -0.00000083,
       -0.00000178,  0.00000348,  0.00000149,  0.00000025, -0.00000021,  0.00000247], dtype=float32)

In [59]:
np.std(data, axis=0)

array([ 0.        ,  0.        ,  1.00000322,  0.99999994,  0.99999982,  0.99999994,  1.        ,  1.        ,  1.        ,  1.00000024, ...,  1.00000012,  1.00000024,  1.00000024,  1.00000024,
        1.        ,  0.99999988,  0.99999952,  1.        ,  0.99999964,  1.00000012], dtype=float32)

In [54]:
# scaler stores the original values to be able to transform later again
# show mean and standard deviation: two vectors with same length as data.shape[1]
scaler.mean_, scaler.scale_

(array([  0.        ,   0.        , -15.83200645, -13.3323307 , -11.67720318,  -5.16446209,  -3.19460917,  -3.1655395 ,  -3.19477248,  -3.2418499 , ...,  -8.34243298,  -8.35699749,  -8.34202576,
         -8.32057381,  -8.31046009,  -8.28858852,  -8.29681206,  -8.29219341,  -8.30333328,  -8.30508804], dtype=float32),
 array([ 1.        ,  1.        ,  1.7693063 ,  1.36666048,  1.36915123,  1.21423578,  1.23309195,  1.27106905,  1.36281705,  1.50109375, ...,  2.0394578 ,  2.01078248,  2.03702593,  2.05707383,
         2.05797529,  2.05667782,  2.06773233,  2.05173779,  2.05352616,  2.06124711], dtype=float32))

# Creating Train & Test Set 

We split the original full data set into two parts: Train Set (75%) and Test Set (25%).

Here we compare Random Split vs. Stratified Split:

In [60]:
testset_size = 0.25 # % portion of whole data set to keep for testing, i.e. 75% is used for training

# RANDOM split of data set into 2 parts
# from sklearn.model_selection import train_test_split

train_set, test_set, train_classes, test_classes = train_test_split(data, classes_num, test_size=testset_size, random_state=0)

In [61]:
train_classes

array([3, 8, 9, 2, 9, 7, 8, 5, 2, 9, ..., 7, 9, 6, 7, 7, 0, 4, 8, 1, 8])

In [62]:
test_classes

array([2, 0, 3, 1, 8, 2, 9, 3, 6, 7, ..., 1, 5, 3, 6, 2, 5, 6, 9, 5, 8])

In [69]:
from collections import Counter
cnt = Counter(train_classes)

print "Number of files in each category in TRAIN set:"
for k in sorted(cnt.keys()):
    print k, ":", cnt[k]

Number of files in each category in TRAIN set:
0 : 80
1 : 74
2 : 70
3 : 70
4 : 84
5 : 72
6 : 74
7 : 74
8 : 73
9 : 79


In a Random Split, the number of files per class may be uneven or unbalanced.

The better way to do it is to use a <b>Stratified Split</b>:

In [111]:
# better: Stratified Split retains the class balance in both sets
# from sklearn.model_selection import StratifiedShuffleSplit

if sklearn_version.startswith('0.17'):
    splits = StratifiedShuffleSplit(classes_num, n_iter=1, test_size=testset_size, random_state=0)
else: # >= 0.18:
    splitter = StratifiedShuffleSplit(n_splits=1, test_size=testset_size, random_state=0)
    splits = splitter.split(data, classes_num)

for train_index, test_index in splits:
    print "TRAIN INDEX:", train_index
    print "TEST INDEX:", test_index
    
    # split the data
    train_set = data[train_index]
    test_set = data[test_index]
    
    # and the numeric classes (groundtruth)
    train_classes = classes_num[train_index]
    train_classes_1hot = classes_num_1hot[train_index]  # 1 hot we need for traning
    test_classes = classes_num[test_index]
# Note: this for loop is only executed once, if n_splits==1

print train_set.shape
print test_set.shape
# Note: we will reshape the data later back to matrix form 

TRAIN INDEX: [349 816 960 261 665 517 653 438 170 472 ..., 336 602 117  93 322 434 624  83 118  97]
TEST INDEX: [510 781 478 246 888 793 772 445 199 534 ..., 967 490 486 683 984 390 858 372 982 499]
(750, 3200)
(250, 3200)


In [112]:
cnt = Counter(train_classes)
print "Number of files in each category in TRAIN set:"
for k in sorted(cnt.keys()):
    print k, ":", cnt[k]

Number of files in each category in TRAIN set:
0 : 75
1 : 75
2 : 75
3 : 75
4 : 75
5 : 75
6 : 75
7 : 75
8 : 75
9 : 75


Now the number of files in each category in the training set is equal.

(It is equal because our full set had 100 files in each category; a Stratified Split preserves the relative distribution of instances per category, also if the dataset is unbalanced between the classess.)

# Convolutional Neural Networks

A Convolutional Neural Network (ConvNet or CNN) is a type of (deep) Neural Network that is well-suited for 2D axes data, such as images or spectrograms, as it is optimized for learning from spatial proximity. Its core elements are 2D filter kernels which essentially learn the weights of the Neural Network, and downscaling functions such as Max Pooling.

A CNN can have one or more Convolution layers, each of them having an arbitrary number of N filters (which define the depth of the CNN layer), following typically by a pooling step, which aggregates neighboring pixels together and thus reduces the image resolution by retaining only the maximum values of neighboring pixels.

## Preparing the Data

### Adding the channel

As CNNs were invented for image data (often having 3 color channels), we need to add a dimension for the color channel to the data. 

<b>Spectrograms, are considered like greyscale images, which only have 1 color channel. Still we add the extra dimension, defining just 1 channel.</b>

In Theano, traditionally the color channel is the <b>first</b> dimension in the image shape. 
In Tensorflow, the color channel is the <b>last</b> dimension in the image shape. 

This can be configured now in ~/.keras/keras.json: "image_dim_ordering": "th" or "tf" with "tf" (Tensorflow) being the default image ordering even though you use Theano. Depending on this, use one of the code lines below.

We created an 'if' statement here to check which dimension ordering to use:

In [114]:
n_channels = 1 # 1 for grey-scale, 3 for RGB (in this case usually already present in the data)

if keras.backend.image_dim_ordering() == 'th':
    # Theano ordering (~/.keras/keras.json: "image_dim_ordering": "th")
    train_set = train_set.reshape(train_set.shape[0], n_channels, ydim, xdim)
    test_set = test_set.reshape(test_set.shape[0], n_channels, ydim, xdim)
else:
    # Tensorflow ordering (~/.keras/keras.json: "image_dim_ordering": "tf")
    train_set = train_set.reshape(train_set.shape[0], ydim, xdim, n_channels)
    test_set = test_set.reshape(test_set.shape[0], ydim, xdim, n_channels)

In [115]:
keras.backend.image_dim_ordering()

'tf'

In [116]:
train_set.shape

(750, 40, 80, 1)

In [117]:
test_set.shape

(250, 40, 80, 1)

In [118]:
# we store the new shape of the images in the 'input_shape' variable.
# take all dimensions except the 0th one (which is the number of images)
input_shape = train_set.shape[1:]  
input_shape

(40, 80, 1)

# Creating Neural Network Models in Keras

## Sequential Models

In Keras, one can choose between a Sequential model and a Graph model. Sequential models are the standard case. Graph models are for parallel networks.

## Creating a Single Layer and a Two Layer CNN

Try: (comment/uncomment code in the following code block)
* 1 Layer
* 2 Layer
* more conv_filters
* Dropout

In [124]:
np.random.seed(0) # make results repeatable

model = Sequential()

conv_filters = 16   # number of convolution filters (= CNN depth)
#conv_filters = 32   # number of convolution filters (= CNN depth)

# Layer 1
model.add(Convolution2D(conv_filters, 3, 3, input_shape=input_shape))
model.add(MaxPooling2D(pool_size=(2, 2))) 
#model.add(Dropout(0.25)) 

# Layer 2
#model.add(Convolution2D(conv_filters, 3, 3))
#model.add(MaxPooling2D(pool_size=(2, 2))) 

# After Convolution, we have a 16*x*y matrix output
# In order to feed this to a Full(Dense) layer, we need to flatten all data
# Note: Keras does automatic shape inference, i.e. it knows how many (flat) input units the next layer will need,
# so no parameter is needed for the Flatten() layer.
model.add(Flatten()) 

# Full layer
model.add(Dense(256, activation='sigmoid')) 

# Output layer
# For binary/2-class problems use ONE sigmoid unit, 
# for multi-class/multi-label problems use n output units and activation='softmax!'
model.add(Dense(n_classes,activation='sigmoid'))

If you get OverflowError: Range exceeds valid bounds in the above box, check the correct Theano vs. Tensorflow ordering in the box before and your keras.json configuration file.

In [125]:
model.summary()

____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
convolution2d_3 (Convolution2D)  (None, 38, 78, 16)    160         convolution2d_input_3[0][0]      
____________________________________________________________________________________________________
maxpooling2d_3 (MaxPooling2D)    (None, 19, 39, 16)    0           convolution2d_3[0][0]            
____________________________________________________________________________________________________
flatten_3 (Flatten)              (None, 11856)         0           maxpooling2d_3[0][0]             
____________________________________________________________________________________________________
dense_5 (Dense)                  (None, 256)           3035392     flatten_3[0][0]                  
___________________________________________________________________________________________

## Training the CNN

In [126]:
# Define a loss function 
loss = 'categorical_crossentropy' 

# Note: for binary classification (2 classes) OR for multi-class problems use:
#loss = 'binary_crossentropy' 

# Optimizer = Stochastic Gradient Descent
optimizer = 'sgd' 

# Compiling the model
model.compile(loss=loss, optimizer=optimizer, metrics=['accuracy'])

In [None]:
# TRAINING the model

# for how many epochs (iterations) to train
epochs = 15

# for training we need the "1 hot encoded" numeric classes of the ground truth
history = model.fit(train_set, train_classes_1hot, batch_size=32, nb_epoch=epochs)

# we keep the history of accuracies on training set

#### Accuracy goes up pretty quickly for 1 layer on Train set! Also on Test set?

### Verifying Accuracy on Test Set

In [84]:
test_pred = model.predict_classes(test_set)



In [39]:
# 1 layer
accuracy_score(test_classes, test_pred)

0.71875

In [45]:
# 2 layer
accuracy_score(test_classes, test_pred)

0.78125

In [67]:
# 2 layer + 32 convolution filters
accuracy_score(test_classes, test_pred)

0.71875

In [85]:
# 2 layer + 32 convolution filters + ReLU + Dropout
accuracy_score(test_classes, test_pred)

0.71875

## Additional Parameters & Techniques

Try: (comment/uncomment code blocks below)
* Adding ReLU activation
* Adding Batch normalization
* Adding Dropout

In [None]:
model = Sequential()

conv_filters = 16   # number of convolution filters (= CNN depth)

# Layer 1
model.add(Convolution2D(conv_filters, 3, 3, border_mode='valid', input_shape=input_shape))
model.add(MaxPooling2D(pool_size=(2, 2))) 

# Layer 2
model.add(Convolution2D(conv_filters, 3, 3, border_mode='valid', input_shape=input_shape))


#model.add(BatchNormalization())
#model.add(Activation('relu')) 
model.add(MaxPooling2D(pool_size=(2, 2))) 

# After Convolution, we have a 16*x*y matrix output
# In order to feed this to a Full(Dense) layer, we need to flatten all data
# Note: Keras does automatic shape inference, i.e. it knows how many (flat) input units the next layer will need,
# so no parameter is needed for the Flatten() layer.
model.add(Flatten()) 

# Full layer
model.add(Dense(256))  
#model.add(Activation('relu'))
#model.add(Dropout(0.1))

# Output layer
# For binary/2-class problems use ONE sigmoid unit, 
# for multi-class/multi-label problems use n output units and activation='softmax!'
model.add(Dense(1,activation='sigmoid'))

## Parallel CNNs

To create parallel CNNs we need a "graph-based" model. In Keras 1.x this is realized via the functional API of the Model() class.
We use it to create two CNN layers that run in parallel to each other and are merged subsequently.

In [103]:
# Input only specifies the input shape
input = Input(input_shape)

# CNN layers
# specify desired number of filters
n_filters = 16 
# The functional API allows to specify the predecessor in (brackets) after the new Layer function call
conv_layer1 = Convolution2D(n_filters, 10, 2)(input)  # a vertical filter
conv_layer2 = Convolution2D(n_filters, 2, 10)(input)  # a horizontal filter

# possibly add Activation('relu') here

# Pooling layers
maxpool1 = MaxPooling2D(pool_size=(2,2))(conv_layer1)
maxpool2 = MaxPooling2D(pool_size=(2,2))(conv_layer2)

# we have to flatten the Pooling output in order to be concatenated
poolflat1 = Flatten()(maxpool1)
poolflat2 = Flatten()(maxpool2)

# Merge the 2
merged = merge([poolflat1, poolflat2], mode='concat')

full = Dense(256, activation='relu')(merged)
output_layer = Dense(1, activation='sigmoid')(full)

# finally create the model
model = Model(input=input, output=output_layer)

In [104]:
model.summary()

____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
input_5 (InputLayer)             (None, 1, 40, 80)     0                                            
____________________________________________________________________________________________________
convolution2d_29 (Convolution2D) (None, 16, 31, 79)    336         input_5[0][0]                    
____________________________________________________________________________________________________
convolution2d_30 (Convolution2D) (None, 16, 39, 71)    336         input_5[0][0]                    
____________________________________________________________________________________________________
maxpooling2d_24 (MaxPooling2D)   (None, 16, 15, 39)    0           convolution2d_29[0][0]           
___________________________________________________________________________________________

In [105]:
# Define a loss function 
loss = 'binary_crossentropy'  # 'categorical_crossentropy' for multi-class problems

# Optimizer = Stochastic Gradient Descent
optimizer = 'sgd' 

# Compiling the model
model.compile(loss=loss, optimizer=optimizer, metrics=['accuracy'])

In [106]:
# TRAINING the model
epochs = 15
history = model.fit(train_set, train_classes, batch_size=32, nb_epoch=epochs)

Epoch 1/15
Epoch 2/15
Epoch 3/15
Epoch 4/15
Epoch 5/15
Epoch 6/15
Epoch 7/15
Epoch 8/15
Epoch 9/15
Epoch 10/15
Epoch 11/15
Epoch 12/15
Epoch 13/15
Epoch 14/15
Epoch 15/15


In [None]:
# Possible Extension:
# other optimizers:
# from keras.optimizers import SGD, RMSprop, Adagrad