# Kaggle Quora Question Pairs [competition](https://www.kaggle.com/c/quora-question-pairs/)

Solving using the Small ConvNet described in Xiang Zhang & Yann LeCun's paper [Character-level Convolutional Networks for Text Classification](https://arxiv.org/pdf/1509.01626.pdf)

In [1]:
# Pre-requisites
import numpy as np
import pandas as pd
from collections import Counter
import os
from sys import getsizeof
#import cv2

# To clear print buffer
from IPython.display import clear_output

In [2]:
# Keras
from keras import backend as K
from keras.models import Model, Sequential
from keras.layers import Input, Conv1D, MaxPooling1D
from keras.layers import Flatten, Dense, Dropout, Lambda
from keras.layers.merge import Concatenate
from keras.layers.embeddings import Embedding
from keras.optimizers import SGD
from keras.initializers import RandomNormal
from keras.callbacks import LearningRateScheduler
from keras.utils import np_utils
from keras.engine.topology import Layer

Using TensorFlow backend.


In [45]:
# Loading saved variable
qsDict = np.load("qsDict.npy").item()
#charCorpus = np.load("charCorpus.npy")
#charCorpusCount = np.load("charCorpusCount.npy")
alphabet = np.load("alphabet.npy")
alphabet = [str(a) for a in alphabet]

# Load data

In [12]:
# Load training and test data
# Download train.csv and test.csv from https://www.kaggle.com/c/quora-question-pairs/
trainDf = pd.read_csv('kaggleQuoraTrain.csv', sep=',')
testDf = pd.read_csv('kaggleQuoraTest.csv', sep=',')

In [13]:
# Check for any null values
print(trainDf.isnull().sum())
print(testDf.isnull().sum())

# Add the string 'empty' to empty strings
trainDf = trainDf.fillna('empty')
testDf = testDf.fillna('empty')

# Check again for any null values
print(trainDf.isnull().sum())
print(testDf.isnull().sum())

id              0
qid1            0
qid2            0
question1       0
question2       2
is_duplicate    0
dtype: int64
test_id      0
question1    2
question2    4
dtype: int64
id              0
qid1            0
qid2            0
question1       0
question2       0
is_duplicate    0
dtype: int64
test_id      0
question1    0
question2    0
dtype: int64


In [14]:
# Convert into np array
trainData = np.array(trainDf)
testData = np.array(testDf)

In [15]:
trainDf

Unnamed: 0,id,qid1,qid2,question1,question2,is_duplicate
0,0,1,2,What is the step by step guide to invest in sh...,What is the step by step guide to invest in sh...,0
1,1,3,4,What is the story of Kohinoor (Koh-i-Noor) Dia...,What would happen if the Indian government sto...,0
2,2,5,6,How can I increase the speed of my internet co...,How can Internet speed be increased by hacking...,0
3,3,7,8,Why am I mentally very lonely? How can I solve...,Find the remainder when [math]23^{24}[/math] i...,0
4,4,9,10,"Which one dissolve in water quikly sugar, salt...",Which fish would survive in salt water?,0
5,5,11,12,Astrology: I am a Capricorn Sun Cap moon and c...,"I'm a triple Capricorn (Sun, Moon and ascendan...",1
6,6,13,14,Should I buy tiago?,What keeps childern active and far from phone ...,0
7,7,15,16,How can I be a good geologist?,What should I do to be a great geologist?,1
8,8,17,18,When do you use シ instead of し?,"When do you use ""&"" instead of ""and""?",0
9,9,19,20,Motorola (company): Can I hack my Charter Moto...,How do I hack Motorola DCX3400 for free internet?,0


In [16]:
testDf

Unnamed: 0,test_id,question1,question2
0,0,How does the Surface Pro himself 4 compare wit...,Why did Microsoft choose core m3 and not core ...
1,1,Should I have a hair transplant at age 24? How...,How much cost does hair transplant require?
2,2,What but is the best way to send money from Ch...,What you send money to China?
3,3,Which food not emulsifiers?,What foods fibre?
4,4,"How ""aberystwyth"" start reading?",How their can I start reading?
5,5,How are the two wheeler insurance from Bharti ...,I admire I am considering of buying insurance ...
6,6,How can I reduce my belly fat through a diet?,How can I reduce my lower belly fat in one month?
7,7,"By scrapping the 500 and 1000 rupee notes, how...",How will the recent move to declare 500 and 10...
8,8,What are the how best books of all time?,What are some of the military history books of...
9,9,After 12th years old boy and I had sex with a ...,Can a 14 old guy date a 12 year old girl?


In [17]:
trainData.shape

(404290, 6)

In [18]:
testData.shape

(2345796, 3)

# Idea

The idea is to construct a Character-level CNN, i.e. a CNN that takes a sentence as a fixed-length frame of individual one-hot encoded characters as the input.

We shall input two questions along two branches of the same model of NN, and then merge the two branches. The output of this CNN after the merging will have a sigmoid neuron to classify whether the two questions are duplicates or not.

In [19]:
# Get list of questions in Question1 and Question2
trainQs1 = trainData[:, 3]
trainQs2 = trainData[:, 4]
testQs1 = testData[:, 1]
testQs2 = testData[:, 2]

In [46]:
# Output of network - whether the two questions are duplicate or not
duplicateOrNot = trainData[:, 5]

In [57]:
# Setting alphabet size
alphabetSize = 70

In [46]:
# Params
inputDim = alphabetSize #number of letters (characters) in alphabet
inputLength = 1014 #input feature length (the paper used 1014)

# Database of questions

To make an alphabet of the most frequent characters used, let us first make a database of all the questions according to their questionIDs, to encode and use later

In [20]:
# Get list of question IDs and questions in training data
qsDict = {}
for data in trainData:
    qsDict[data[1]] = data[3].lower()
    qsDict[data[2]] = data[4].lower()

In [21]:
# Save qsDict
np.save("qsDict", qsDict)

In [22]:
# Extract question IDs and questions
qIds = list(qsDict.keys())
questions = list(qsDict.values())

In [23]:
qsDict[1]

'what is the step by step guide to invest in share market in india?'

In [24]:
len(qsDict)

537933

In [25]:
questions

['what is the step by step guide to invest in share market in india?',
 'what is the step by step guide to invest in share market?',
 'what is the story of kohinoor (koh-i-noor) diamond?',
 'what would happen if the indian government stole the kohinoor (koh-i-noor) diamond back?',
 'how can i increase the speed of my internet connection while using a vpn?',
 'how can internet speed be increased by hacking through dns?',
 'why am i mentally very lonely? how can i solve it?',
 'find the remainder when [math]23^{24}[/math] is divided by 24,23?',
 'which one dissolve in water quikly sugar, salt, methane and carbon di oxide?',
 'which fish would survive in salt water?',
 'astrology: i am a capricorn sun cap moon and cap rising...what does that say about me?',
 "i'm a triple capricorn (sun, moon and ascendant in capricorn) what does this say about me?",
 'should i buy tiago?',
 'what keeps childern active and far from phone and video games?',
 'how can i be a good geologist?',
 'what should 

For curiosity's sake, let's check how many questions are actually unique, discounting dupicates as the same question.

In [26]:
# Number of questions, counting duplicates as same
data = np.array(trainData)
data[data[:,5]==1, 2] = 0
uniqueQs = np.unique(np.array([[data[:, 1]], [data[:, 2]]]))[1:]
print(len(uniqueQs))

484549


# Alphabet

Let us make a corpus of characters in the questions database, find out the number of times each character occurs in the database, and choose only the most frequent characters as our alphabet.

In [27]:
# MAKE CORPUS OF CHARACTERS

# Append all characters from training data into list
charFullCorpus = []
for (q, question) in enumerate(questions):
    # Printing status (makes it slow)
    #clear_output(); print(str(q)+" of "+str(len(questions)))
    for char in list(question):
        charFullCorpus.append(char)

In [35]:
# DO NOT RUN THIS!!!!
# VERY SLOW. INSTEAD, USE THE NEXT ONE

# EXTRACT CHARCTER CORPUS

# Extract unique characters
charCorpus = np.unique(charFullCorpus)

# Save charCorpus
np.save("charCorpus", charCorpus)

print(charCorpus)
for c in charCorpus:
    print(c)

# Count the number of times each character occurs
charCorpusCount = [charFullCorpus.count(c) for c in charCorpus] 

# Save charCorpusCount
np.save("charCorpusCount", charCorpusCount)

charCorpusCount

# Sort charCorpus according to the number of times of occurence
charCorpusCountSorted = sorted(charCorpusCount)
charCorpusSorted = [y for (x, y) in sorted(zip(charCorpusCount, charCorpus))]

In [51]:
# EXTRACT CHARCTER CORPUS
charCorpusCountSorted, charCorpusSorted = map(list, zip(*sorted(zip(Counter(charFullCorpus).values(), Counter(charFullCorpus).keys()))))

In [52]:
charCorpusCountSorted[-71:]

[257,
 266,
 301,
 425,
 560,
 565,
 643,
 655,
 791,
 864,
 962,
 1002,
 1504,
 1510,
 1653,
 1747,
 1840,
 1861,
 1908,
 2590,
 2592,
 2618,
 3259,
 4991,
 6617,
 6780,
 8386,
 10551,
 11219,
 12008,
 14437,
 16495,
 16858,
 20466,
 20576,
 22340,
 27846,
 28190,
 31848,
 33592,
 40395,
 50661,
 50781,
 58584,
 58906,
 61761,
 74589,
 229374,
 261834,
 419653,
 534900,
 543704,
 550207,
 567638,
 569042,
 688858,
 723708,
 836139,
 851452,
 969403,
 970142,
 1414386,
 1549469,
 1690899,
 1767482,
 2093135,
 2189449,
 2222609,
 2311653,
 3000176,
 5597515]

In [53]:
charCorpusSorted[-71:]

['@',
 '|',
 '₹',
 '…',
 ';',
 '#',
 '*',
 '_',
 '!',
 'é',
 '“',
 '”',
 '}',
 '{',
 '=',
 '\\',
 '^',
 '$',
 '%',
 ']',
 '[',
 '’',
 '&',
 '+',
 '9',
 '8',
 '7',
 ':',
 '4',
 '6',
 '3',
 '/',
 '5',
 '(',
 ')',
 'z',
 '-',
 '"',
 '2',
 'q',
 '1',
 '0',
 'j',
 '.',
 "'",
 'x',
 ',',
 'k',
 'v',
 'b',
 'y',
 'g',
 'f',
 'p',
 '?',
 'u',
 'm',
 'w',
 'c',
 'l',
 'd',
 'h',
 'r',
 's',
 'n',
 'i',
 'o',
 't',
 'a',
 'e',
 ' ']

[Character-level Convolutional Networks for Text Classification](https://arxiv.org/pdf/1509.01626.pdf) uses 70 characters as the alphabet, excluding Capital letters (which were converted to small in all text) and blank spaces.

In [58]:
# Assign the most frequent #alphabetSize number of characters as the alphabet for the network
# Also, remove blank space (the most frequent character) from alphabet
alphabet = charCorpusSorted[-alphabetSize-1:-1]

In [43]:
# Save alphabet
np.save("alphabet", alphabet)

In [59]:
alphabet

['@',
 '|',
 '₹',
 '…',
 ';',
 '#',
 '*',
 '_',
 '!',
 'é',
 '“',
 '”',
 '}',
 '{',
 '=',
 '\\',
 '^',
 '$',
 '%',
 ']',
 '[',
 '’',
 '&',
 '+',
 '9',
 '8',
 '7',
 ':',
 '4',
 '6',
 '3',
 '/',
 '5',
 '(',
 ')',
 'z',
 '-',
 '"',
 '2',
 'q',
 '1',
 '0',
 'j',
 '.',
 "'",
 'x',
 ',',
 'k',
 'v',
 'b',
 'y',
 'g',
 'f',
 'p',
 '?',
 'u',
 'm',
 'w',
 'c',
 'l',
 'd',
 'h',
 'r',
 's',
 'n',
 'i',
 'o',
 't',
 'a',
 'e']

In [11]:
# DO NOT RUN THIS!!!
# NOT REQUIRED

# Making one-hot encoded alphabets
encodedAlphabet = np.eye(alphabetSize).astype('float32')

print(encodedAlphabet)

# Checking existing alphabet
char = 'f'
if char in alphabet:
    print(str(char)+" is at "+str(alphabet.index(char)))
    print(encodedAlphabet[alphabet.index(char)])
else:
    print(str(char)+" not found.")
    print(np.zeros((1, alphabetSize)))
char = '∂'
if char in alphabet:
    print(str(char)+" is at "+str(alphabet.index(char)))
    print(encodedAlphabet[alphabet.index(char)])
else:
    print(str(char)+" not found.")
    print(np.zeros((1, alphabetSize)))

## Encoding questions

Each question needs to be encoded as an array of $inputLength$ characters, each character itself being encoded as a $1{\times}alphabetSize$-dimensional vector.

In [49]:
# Find max length of question (for first layer of CNN)
maxQLength = np.max([len(q) for q in questions])
print(maxQLength)

1169


The maximum question length ($maxQLength$) was found to be 1169. 

One way to encode questions would be to set their lengths as 1200, if we want it to be greater than the maximum length. The remaining characters (for each question) shall be set to zeros.

According to Xiang Zhang & Yann LeCun's paper, it is apparently enough to set the length to 1014, as most of the information is captured in it.

In [None]:
# DO NOT RUN THIS!!!!

# ENCODE n QUESTIONS
n = len(questions)

# Initialize encoded questions array
encodedQs = np.zeros((n, inputLength, inputDim)).astype('float32')

# For each question
for (q, question) in enumerate(questions[:n]):
    # For each character in question, in reversed order (so latest character is first)
    for (c, char) in enumerate(reversed(question[:inputLength])):
        if char in alphabet:
            encodedQs[q][c] = encodedAlphabet[alphabet.index(char)]
        else:
            encodedQs[q][c] = np.zeros((alphabetSize))

In [None]:
# DO NOT RUN THIS!!!!
#np.save("encodedQs", encodedQs)

The above takes too long, and saves in ~30GB of memory.

In [16]:
## DO NOT RUN THIS!!!!!

def oneHotEncodeQs(questions, inputLength, alphabet):
    alphabetSize = len(alphabet)
    # Initialize encoded questions array
    encodedQs = np.zeros((len(questions), inputLength, alphabetSize)).astype('float32')
    # For each question
    for (q, question) in enumerate(questions):
        # For each character in question, in reversed order (so latest character is first)
        for (c, char) in enumerate(reversed(question[:inputLength])):
            if char in alphabet:
                encodedQs[q][c] = encodedAlphabet[alphabet.index(char)].astype('float32')
            else:
                encodedQs[q][c] = np.zeros((alphabetSize)).astype('float32')
    return encodedQs

# Make encoded questions out of training questions 1 and 2
encodedQ1s = oneHotEncodeQs(trainQs1, inputLength, list(alphabet))
encodedQ2s = oneHotEncodeQs(trainQs2, inputLength, list(alphabet))

The above stops the kernel.

In [47]:
def encodeQs(questions, inputLength, alphabet):
    alphabetSize = len(alphabet)
    # Initialize encoded questions array
    encodedQs = np.zeros((len(questions), inputLength))
    # For each question
    for (q, question) in enumerate(questions):
        #print(q)
        # For each character in question, in reversed order (so latest character is first)
        for (c, char) in enumerate(reversed(question[:inputLength])):
            #print("  "+str(c))
            if char in alphabet:
                encodedQs[q][c] = alphabet.index(char)
            else:
                encodedQs[q][c] = 0
    return encodedQs

In [48]:
# Make encoded questions out of training questions 1 and 2
encodedQ1s = encodeQs(trainQs1, inputLength, alphabet)
encodedQ2s = encodeQs(trainQs2, inputLength, alphabet)

In [51]:
getsizeof(encodedQ1s)

3881184112

Each of encodedQ1s is of around 3.6GB.

In [55]:
np.save("encodedQ1s", encodedQ1s)
np.save("encodedQ2s", encodedQ2s)

In [52]:
len(encodedQ1s[0])

1200

# IGNORE

But this would build a very very large $nd.array$ $encodedQs$. So instead, let's make a custom layer that can do this in run-time for each input.

In [53]:
# LAYER TO ENCODE QUESTIONS
class EncodeQuestions(Layer):
    
    def __init__(self, alphabet, input_length, **kwargs):
        self.alphabet = alphabet
        self.inputLength = input_length
        super(EncodeQuestions, self).__init__(**kwargs)
    
    def build(self, input_shape):
        self.alphabetSize = len(alphabet)
        self.encodedAlphabet = np.eye(alphabetSize)
        super(EncodeQuestions, self).build(input_shape)
    
    def call(self, question):
        encodedQ = np.zeros((self.inputLength, self.alphabetSize))
        # For each character in question, upto #inputLength number of characters,
        #in reversed order (so latest character is first)
        i = 0
        for (c, char) in enumerate(reversed(question)):
            if i == self.inputLength:
                break
            if char in self.alphabet:
                encodedQ[c] = self.encodedAlphabet[self.alphabet.index(char)]
            else:
                encodedQ[c] = np.zeros((self.alphabetSize))
            i += 1
        return encodedQ
    
    def compute_output_shape(self, input_shape):
        return (input_shape[0], self.inputLength, self.alphabetSize)

# MODEL

## IGNORE

In [108]:
# MODEL

# Model for Q1
modelQ1 = Sequential()
modelQ1.add(Conv1D(256, 7, strides=1, padding='valid', activation='relu', input_shape=(inputLength, dimOfEachInput), kernel_initializer=RandomNormal(mean=0.0, stddev=0.05, seed=None), bias_initializer=RandomNormal(mean=0.0, stddev=0.05, seed=None)))
modelQ1.add(MaxPooling1D(pool_size=3, strides=3))
modelQ1.add(Conv1D(256, 7, strides=1, padding='valid', activation='relu', kernel_initializer=RandomNormal(mean=0.0, stddev=0.05, seed=None), bias_initializer=RandomNormal(mean=0.0, stddev=0.05, seed=None)))
modelQ1.add(MaxPooling1D(pool_size=3, strides=3))
modelQ1.add(Conv1D(256, 3, strides=1, padding='valid', activation='relu', kernel_initializer=RandomNormal(mean=0.0, stddev=0.05, seed=None), bias_initializer=RandomNormal(mean=0.0, stddev=0.05, seed=None)))
modelQ1.add(Conv1D(256, 3, strides=1, padding='valid', activation='relu', kernel_initializer=RandomNormal(mean=0.0, stddev=0.05, seed=None), bias_initializer=RandomNormal(mean=0.0, stddev=0.05, seed=None)))
modelQ1.add(Conv1D(256, 3, strides=1, padding='valid', activation='relu', kernel_initializer=RandomNormal(mean=0.0, stddev=0.05, seed=None), bias_initializer=RandomNormal(mean=0.0, stddev=0.05, seed=None)))
modelQ1.add(Conv1D(256, 3, strides=1, padding='valid', activation='relu', kernel_initializer=RandomNormal(mean=0.0, stddev=0.05, seed=None), bias_initializer=RandomNormal(mean=0.0, stddev=0.05, seed=None)))
modelQ1.add(MaxPooling1D(pool_size=3, strides=3))
modelQ1.add(Flatten())
modelQ1.add(Dense(1024, activation='relu'))
modelQ1.add(Dropout(0.5))
modelQ1.add(Dense(1024, activation='relu'))
modelQ1.add(Dropout(0.5))

# Model for Q2
modelQ2 = Sequential()
modelQ2.add(Conv1D(256, 7, strides=1, padding='valid', activation='relu', input_shape=(inputLength, dimOfEachInput), kernel_initializer=RandomNormal(mean=0.0, stddev=0.05, seed=None), bias_initializer=RandomNormal(mean=0.0, stddev=0.05, seed=None)))
modelQ2.add(MaxPooling1D(pool_size=3, strides=3))
modelQ2.add(Conv1D(256, 7, strides=1, padding='valid', activation='relu', kernel_initializer=RandomNormal(mean=0.0, stddev=0.05, seed=None), bias_initializer=RandomNormal(mean=0.0, stddev=0.05, seed=None)))
modelQ2.add(MaxPooling1D(pool_size=3, strides=3))
modelQ2.add(Conv1D(256, 3, strides=1, padding='valid', activation='relu', kernel_initializer=RandomNormal(mean=0.0, stddev=0.05, seed=None), bias_initializer=RandomNormal(mean=0.0, stddev=0.05, seed=None)))
modelQ2.add(Conv1D(256, 3, strides=1, padding='valid', activation='relu', kernel_initializer=RandomNormal(mean=0.0, stddev=0.05, seed=None), bias_initializer=RandomNormal(mean=0.0, stddev=0.05, seed=None)))
modelQ2.add(Conv1D(256, 3, strides=1, padding='valid', activation='relu', kernel_initializer=RandomNormal(mean=0.0, stddev=0.05, seed=None), bias_initializer=RandomNormal(mean=0.0, stddev=0.05, seed=None)))
modelQ2.add(Conv1D(256, 3, strides=1, padding='valid', activation='relu', kernel_initializer=RandomNormal(mean=0.0, stddev=0.05, seed=None), bias_initializer=RandomNormal(mean=0.0, stddev=0.05, seed=None)))
modelQ2.add(MaxPooling1D(pool_size=3, strides=3))
modelQ2.add(Flatten())
modelQ2.add(Dense(1024, activation='relu'))
modelQ2.add(Dropout(0.5))
modelQ2.add(Dense(1024, activation='relu'))
modelQ2.add(Dropout(0.5))

# Merge 
model = Sequential()
model.add(Merge([modelQ1, modelQ2], mode = 'concat'))
model.add(Dense(1, activation = 'sigmoid'))



## CREATING BASE NETWORK

I shall make the Small and Large ConvNet described in Xiang Zhang & Yann LeCun's paper [Character-level Convolutional Networks for Text Classification](https://arxiv.org/pdf/1509.01626.pdf) 

In [None]:
def createBaseNetworkSmall(inputDim, inputLength):
    baseNetwork = Sequential()
    baseNetwork.add(Embedding(input_dim=inputDim, output_dim=inputDim, input_length=inputLength))
    baseNetwork.add(Conv1D(256, 7, strides=1, padding='valid', activation='relu', kernel_initializer=RandomNormal(mean=0.0, stddev=0.05), bias_initializer=RandomNormal(mean=0.0, stddev=0.05)))
    baseNetwork.add(MaxPooling1D(pool_size=3, strides=3))
    baseNetwork.add(Conv1D(256, 7, strides=1, padding='valid', activation='relu', kernel_initializer=RandomNormal(mean=0.0, stddev=0.05), bias_initializer=RandomNormal(mean=0.0, stddev=0.05)))
    baseNetwork.add(MaxPooling1D(pool_size=3, strides=3))
    baseNetwork.add(Conv1D(256, 3, strides=1, padding='valid', activation='relu', kernel_initializer=RandomNormal(mean=0.0, stddev=0.05), bias_initializer=RandomNormal(mean=0.0, stddev=0.05)))
    baseNetwork.add(Conv1D(256, 3, strides=1, padding='valid', activation='relu', kernel_initializer=RandomNormal(mean=0.0, stddev=0.05), bias_initializer=RandomNormal(mean=0.0, stddev=0.05)))
    baseNetwork.add(Conv1D(256, 3, strides=1, padding='valid', activation='relu', kernel_initializer=RandomNormal(mean=0.0, stddev=0.05), bias_initializer=RandomNormal(mean=0.0, stddev=0.05)))
    baseNetwork.add(Conv1D(256, 3, strides=1, padding='valid', activation='relu', kernel_initializer=RandomNormal(mean=0.0, stddev=0.05), bias_initializer=RandomNormal(mean=0.0, stddev=0.05)))
    baseNetwork.add(MaxPooling1D(pool_size=3, strides=3))
    baseNetwork.add(Flatten())
    baseNetwork.add(Dense(1024, activation='relu'))
    baseNetwork.add(Dropout(0.5))
    baseNetwork.add(Dense(1024, activation='relu'))
    baseNetwork.add(Dropout(0.5))
    return baseNetwork

def createBaseNetworkLarge(inputDim, inputLength):
    baseNetwork = Sequential()
    baseNetwork.add(Embedding(input_dim=inputDim, output_dim=inputDim, input_length=inputLength))
    baseNetwork.add(Conv1D(1024, 7, strides=1, padding='valid', activation='relu', kernel_initializer=RandomNormal(mean=0.0, stddev=0.02), bias_initializer=RandomNormal(mean=0.0, stddev=0.02)))
    baseNetwork.add(MaxPooling1D(pool_size=3, strides=3))
    baseNetwork.add(Conv1D(1024, 7, strides=1, padding='valid', activation='relu', kernel_initializer=RandomNormal(mean=0.0, stddev=0.02), bias_initializer=RandomNormal(mean=0.0, stddev=0.02)))
    baseNetwork.add(MaxPooling1D(pool_size=3, strides=3))
    baseNetwork.add(Conv1D(1024, 3, strides=1, padding='valid', activation='relu', kernel_initializer=RandomNormal(mean=0.0, stddev=0.02), bias_initializer=RandomNormal(mean=0.0, stddev=0.02)))
    baseNetwork.add(Conv1D(1024, 3, strides=1, padding='valid', activation='relu', kernel_initializer=RandomNormal(mean=0.0, stddev=0.02), bias_initializer=RandomNormal(mean=0.0, stddev=0.02)))
    baseNetwork.add(Conv1D(1024, 3, strides=1, padding='valid', activation='relu', kernel_initializer=RandomNormal(mean=0.0, stddev=0.02), bias_initializer=RandomNormal(mean=0.0, stddev=0.02)))
    baseNetwork.add(Conv1D(1024, 3, strides=1, padding='valid', activation='relu', kernel_initializer=RandomNormal(mean=0.0, stddev=0.02), bias_initializer=RandomNormal(mean=0.0, stddev=0.02)))
    baseNetwork.add(MaxPooling1D(pool_size=3, strides=3))
    baseNetwork.add(Flatten())
    baseNetwork.add(Dense(2048, activation='relu'))
    baseNetwork.add(Dropout(0.5))
    baseNetwork.add(Dense(2048, activation='relu'))
    baseNetwork.add(Dropout(0.5))
    return baseNetwork

## SIAMESE NETWORK

Building a Siamese network from the [MNIST Siamese Network example](https://github.com/fchollet/keras/blob/master/examples/mnist_siamese_graph.py)

In [61]:
def euclidean_distance(vects):
    x, y = vects
    return K.sqrt(K.maximum(K.sum(K.square(x - y), axis=1, keepdims=True), K.epsilon()))

def eucl_dist_output_shape(shapes):
    shape1, shape2 = shapes
    return (shape1[0], 1)

def contrastive_loss(y_true, y_pred):
    '''Contrastive loss from Hadsell-et-al.'06
    http://yann.lecun.com/exdb/publis/pdf/hadsell-chopra-lecun-06.pdf
    '''
    margin = 1
    return K.mean(y_true * K.square(y_pred) +
                  (1 - y_true) * K.square(K.maximum(margin - y_pred, 0)))

In [62]:
baseNetwork = createBaseNetworkSmall(inputDim, inputLength)

# Inputs
inputA = Input(shape=(inputLength,))
inputB = Input(shape=(inputLength,))

# because we re-use the same instance `base_network`,
# the weights of the network will be shared across the two branches
processedA = baseNetwork(inputA)
processedB = baseNetwork(inputB)

distance = Lambda(euclidean_distance, output_shape=eucl_dist_output_shape)([processedA, processedB])

model = Model([inputA, inputB], distance)

In [63]:
# Compile
initLR = 0.01
momentum = 0.9
sgd = SGD(lr=LR, momentum=momentum, decay=0, nesterov=False)
model.compile(loss=contrastive_loss, optimizer=sgd, metrics =['accuracy'])

In [64]:
# Halve learning rate for every 3rd epoch
def stepDecay(epoch):
    initLR = 0.01
    newLR = float(initLR/np.power(2, (int(epoch/3))))
    print("stepDecay: Epoch "+str(epoch)+" ; lr: "+str(newLR))
    return newLR
lRate = LearningRateScheduler(stepDecay)

In [None]:
# Checkpoint
filepath = "weights-{epoch:02d}-{val_acc:.2f}.hdf5"
checkpoint = ModelCheckpoint(filepath, monitor='val_acc', verbose=1,
                             save_best_only=True, mode='max')

In [None]:
# Fit
callbacks = [lRate, checkpoint]
minibatchSize = 128
nEpochs = 1
validationSplit = 0.8
model.fit([encodedQ1s, encodedQ2s], outputs, batch_size=minibatchSize,
          epochs=nEpochs, verbose=1, callbacks=callbacks,
          validation_split=validationSplit)

## SIAMESE-PLUS NETWORK

Building a Siamese network like the [MNIST Siamese Network example](https://github.com/fchollet/keras/blob/master/examples/mnist_siamese_graph.py)

In [56]:
baseNetwork = createBaseNetworkSmall(inputDim, inputLength)

# Inputs
inputA = Input(shape=(inputLength,))
inputB = Input(shape=(inputLength,))

# because we re-use the same instance `base_network`,
# the weights of the network will be shared across the two branches
processedA = baseNetwork(inputA)
processedB = baseNetwork(inputB)

# Concatenate
conc = Concatenate()([processedA, processedB])

# Add more layers
x = Dense(1024, activation='relu')(conc)
x = Dropout(0.5)(x)
x = Dense(1024, activation='relu')(x)
x = Dropout(0.5)(x)
predictions = Dense(1, activation='sigmoid')(x)

# This creates a model that includes the Input and Dense layers
model = Model(inputs=[inputA, inputB], outputs=predictions)

In [57]:
# Compile
# model.compile(loss=contrastive_loss, optimizer=sgd, metrics=['accuracy'])
initLR = 0.01
momentum = 0.9
sgd = SGD(lr=initLR, momentum=momentum, decay=1e-5, nesterov=True)
model.compile(loss='binary_crossentropy', optimizer=sgd, metrics=['accuracy'])

In [None]:
# Halve learning rate for every 3rd epoch
def stepDecay(epoch):
    initLR = 0.01
    newLR = float(initLR/np.power(2, (int(epoch/3))))
    print("stepDecay: Epoch "+str(epoch)+" ; lr: "+str(newLR))
    return newLR
lRate = LearningRateScheduler(stepDecay)

In [68]:
# Halve learning rate for every 3rd epoch
def stepDecay(epoch):
    initLR = 0.01
    newLR = float(initLR/np.power(2, (int(epoch/3))))
    print("stepDecay: Epoch "+str(epoch)+" ; lr: "+str(newLR))
    return newLR
lRate = LearningRateScheduler(stepDecay)
callbacks = [lRate]

In [None]:
# Checkpoint
filepath = "weights-{epoch:02d}-{val_acc:.2f}.hdf5"
checkpoint = ModelCheckpoint(filepath, monitor='val_acc', verbose=1,
                             save_best_only=True, mode='max')

In [None]:
# Fit
callbacks = [checkpoint, lRate]
minibatchSize = 128
nEpochs = 1
validationSplit = 0.8
model.fit([encodedQ1s, encodedQ2s], outputs, batch_size=minibatchSize,
          epochs=nEpochs, verbose=1, callbacks=callbacks,
          validation_split=validationSplit)