# Dependencies

There is a [known issue](https://github.com/biocore/songbird/issues/47) in the OpenMP library for OS X and Tensorflow. Unfortunatly, Jupyter gives no indication that something is wrong, but if you try to train from the comman-line you'll see the error. 

Running the following command seems to fix things:

```
$ conda install nomkl
```

If you get the [following error](https://github.com/numpy/numpy/issues/12744):

```text
cannot import name '_validate_lengths'
```

Then try updating scikit-image:

```bash
$ conda install -c conda-forge scikit-image
```

# Build model

In [1]:
import numpy as np

import keras
from keras.models import Sequential
from keras.layers import Dense, Dropout

model = Sequential([
    Dense(512, input_dim=8, activation='relu'),
    Dropout(0.5),
    Dense(512, activation='relu'),
    Dropout(0.5),
    Dense(1, activation='sigmoid')
])

model.compile(loss='binary_crossentropy',
              optimizer='sgd',
              metrics=['accuracy'])

model.summary()

Using TensorFlow backend.


_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_1 (Dense)              (None, 512)               4608      
_________________________________________________________________
dropout_1 (Dropout)          (None, 512)               0         
_________________________________________________________________
dense_2 (Dense)              (None, 512)               262656    
_________________________________________________________________
dropout_2 (Dropout)          (None, 512)               0         
_________________________________________________________________
dense_3 (Dense)              (None, 1)                 513       
Total params: 267,777
Trainable params: 267,777
Non-trainable params: 0
_________________________________________________________________


# Load data

In [2]:
import glob
import pickle
import numpy as np

N_SAMPLES = 20000

# X is the training input, Y is the labels
X, Y = np.empty((0,8)), []

for path in sorted(glob.glob("*/data.p")):
    print("Processing", path, "...")
    
    # x is the training data for this sample, y is the labels
    x, y = pickle.load(open(path, "rb"))
    
    # sample N_SAMPLES signal and N_SAMPLES background vectors
    for i in [1,0]:
        
        # x_i is the training data with the label i
        x_i = x[y == i]

        if len(x_i) > 0: 
            indicies = np.random.choice(len(x_i), N_SAMPLES, replace=False)
            samples = x_i[indicies, :]
            
            Y = np.append(Y, [i]*N_SAMPLES)
            X = np.append(X, samples, axis=0)

X = X.astype('float32') / 255
print("\nRead", len(X), "vectors and labels.")

Processing 50ug-SamariumNitrate-Not-C4/data.p ...
Processing C4 black car panel/data.p ...
Processing C4 chunk/data.p ...
Processing C4 metal panel/data.p ...
Processing C4 red glossy paper/data.p ...
Processing C4 white car panel/data.p ...
Processing C4 wood panel/data.p ...
Processing DSYP60-not-C4/data.p ...

Read 280000 vectors and labels.


# Prepare data

In [3]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, Y)

print('x_train shape:', X_train.shape)
print('x_test shape:', X_test.shape)

x_train shape: (210000, 8)
x_test shape: (70000, 8)


# Train

In [None]:
from keras.callbacks import EarlyStopping
from keras.callbacks import ModelCheckpoint

stop = EarlyStopping(monitor='val_loss', mode='min', verbose=1, patience=200)
checkpoint = ModelCheckpoint('models/checkpoint.h5', monitor='val_acc', mode='max', verbose=1, save_best_only=True)

history = model.fit(
    X_train, y_train,
    verbose=2, shuffle=True,
    batch_size=128, epochs=4000,
    callbacks=[stop, checkpoint],
    validation_data=(X_test, y_test)    
)

Train on 210000 samples, validate on 70000 samples
Epoch 1/4000
 - 50s - loss: 0.6484 - acc: 0.6058 - val_loss: 0.6077 - val_acc: 0.6803

Epoch 00001: val_acc improved from -inf to 0.68026, saving model to checkpoint.h5
Epoch 2/4000
 - 62s - loss: 0.5830 - acc: 0.7183 - val_loss: 0.5483 - val_acc: 0.7713

Epoch 00002: val_acc improved from 0.68026 to 0.77134, saving model to checkpoint.h5
Epoch 3/4000
 - 67s - loss: 0.5278 - acc: 0.7740 - val_loss: 0.4895 - val_acc: 0.7877

Epoch 00003: val_acc improved from 0.77134 to 0.78770, saving model to checkpoint.h5
Epoch 4/4000
 - 63s - loss: 0.4703 - acc: 0.7924 - val_loss: 0.4304 - val_acc: 0.8013

Epoch 00004: val_acc improved from 0.78770 to 0.80133, saving model to checkpoint.h5
Epoch 5/4000
 - 63s - loss: 0.4205 - acc: 0.8226 - val_loss: 0.3900 - val_acc: 0.8377

Epoch 00005: val_acc improved from 0.80133 to 0.83769, saving model to checkpoint.h5
Epoch 6/4000
 - 62s - loss: 0.3919 - acc: 0.8384 - val_loss: 0.3700 - val_acc: 0.8493

Epoch

# Score

In [11]:
model = keras.models.load_model('models/checkpoint.h5')
loss, accuracy = model.evaluate(X_test, y_test, batch_size=128)

print('Test loss:', loss)
print('Test accuracy:', accuracy)

Test loss: 0.05271610697151234
Test accuracy: 0.9879072643951529


# Save model

In [13]:
model.save("models/c4-neural-network.h5")