#### This particular assignment focuses on text classification using CNN. It has been picking up pace over the past few years. So, I thought this would be a good exercise to try out. The dataset is provided to you and there will be specific instrucions on how to curate the data, split into train and validation and the like.  You will be using MXnet for this task.  The data comprises tweets pertaining to common causes of cancer. The objective is to classify the tweets as medically relevant or not.  The dataset is skewed with positive class or 'yes' being 6 times less frequent than the negative class or 'no'. (Total marks = 50). Individual marks to the sub-problems are given in bracket. 

In [1]:
# these are the modules you are allowed to work with. 

import nltk
import re
import numpy as np
import mxnet as mx
import sys, os
from collections import Counter, namedtuple
import itertools
import random
import math
import time

'''
First job is to clean and preprocess the social media text. (5)

1) Replace URLs and mentions (i.e strings which are preceeded with @)
2) Segment #hastags 
3) Remove emoticons and other unicode characters
'''

def preprocess_tweet(input_text):
    '''
    Input: The input string read directly from the file
    
    Output: Pre-processed tweet text
    '''
    
    cleaned_text = re.sub(r'(( |)@\S+)|(( |)http\S+)', '', input_text)
    
    hashpattern = re.compile(r'#(A-Z{1}a-z+)+')
    
    hashes = hashpattern.finditer(cleaned_text)
    
    for i in hashes:
      ids = i.span()
      h = cleaned_text[ids[0]:ids[1]]
      clean_h = ' '.join(re.findall('[A-Z][^A-Z]*', h[1:]))
      cleaned_text = cleaned_text.replace(h, clean_h)
        
    RE_EMOJI = re.compile('[\U00010000-\U0010ffff]', flags=re.UNICODE)
    cleaned_text = RE_EMOJI.sub(r'', cleaned_text)
            
    #cleaned_text = re.sub(r"[^A-Za-z0-9(),!?\'\`]", " ", cleaned_text)
    cleaned_text = re.sub(r"[^A-Za-z]", " ", cleaned_text)
    cleaned_text = cleaned_text.encode('ascii', 'ignore').decode("utf-8")
    
    return cleaned_text

# read the input file and create the set of positive examples and negative examples. 

file=open('cancer_data.tsv', encoding="utf8")
pos_data=[]
neg_data=[]

for line in file:
    line=line.strip().split('\t')
    text2= preprocess_tweet(line[0]).strip().split()
    if line[1]=='yes':
        pos_data.append(text2)
    if line[1]=='no':
        neg_data.append(text2)

print(len(pos_data), len(neg_data))     

sentences= list(pos_data)
sentences.extend(neg_data)
pos_labels= [1 for _ in pos_data]
neg_labels= [0 for _ in neg_data]
y=list(pos_labels)
y.extend(neg_labels)
y=np.array(y)

'''
After this you will obtain the following :

1) sentences =  List of sentences having the positive and negative examples with all the positive examples first
2) y = List of labels with the positive labels first.
'''

'''
Before running the CNN there are a few things one needs to take care of: (5)

1) Pad the sentences so that all of them are of the same length
2) Build a vocabulary comprising all unique words that occur in the corpus
3) Convert each sentence into a corresponding vector by replacing each word in the sentence with the index in the vocabulary. 

Example :
S1 = a b a c
S2 = d c a 

Step 1:  S1= a b a c, 
         S2 =d c a </s> 
         (Both sentences are of equal length). 

Step 2:  voc={a:1, b:2, c:3, d:4, </s>: 5}

Step 3:  S1= [1,2,1,3]
         S2= [4,3,1,5]

'''

def create_word_vectors(sentences):
    '''
    Input: List of sentences
    Output: List of word vectors corresponding to each sentence, vocabulary
    '''
    
    sequence_length = max(len(x) for x in sentences)
    padded_sentences = []
    for i in range(len(sentences)):
        sentence = sentences[i]
        num_padding = sequence_length - len(sentence)
        new_sentence = sentence + ["</s>"] * num_padding
        padded_sentences.append(new_sentence)
    
    # Build vocabulary
    word_counts = Counter(itertools.chain(*padded_sentences))
    
    # Mapping from index to word
    vocabulary_inv = [x[0] for x in word_counts.most_common()]
    
    # Mapping from word to index
    vocabulary = {x: i for i, x in enumerate(vocabulary_inv)}
    
    x = np.array([
            [vocabulary[word] for word in sentence]
            for sentence in padded_sentences])
    
    return x, vocabulary


x, vocabulary = create_word_vectors(sentences)


def create_shuffle(x,y):
    '''
    Create an equal distribution of the positive and negative examples. 
    Please do not change this particular shuffling method.
    '''
    pos_len= len(pos_data)
    neg_len= len(neg_data)
    pos_len_train= int(0.8*pos_len)
    neg_len_train= int(0.8*neg_len)
    train_data= [(x[i],y[i]) for i in range(0, pos_len_train)]
    train_data.extend([(x[i],y[i]) for i in range(pos_len, pos_len+ neg_len_train )])
    test_data=[(x[i],y[i]) for i in range(pos_len_train, pos_len)]
    test_data.extend([(x[i],y[i]) for i in range(pos_len+ neg_len_train, len(x) )])
    
    random.shuffle(train_data)
    x_train=[i[0] for i in train_data]
    y_train=[i[1] for i in train_data]
    random.shuffle(test_data)
    x_test=[i[0] for i in test_data]
    y_test=[i[1] for i in test_data]
    
    x_train=np.array(x_train)
    y_train=np.array(y_train)
    x_test= np.array(x_test)
    y_test= np.array(y_test)
    return x_train, y_train, x_test, y_test

x_train, y_train, x_test, y_test= create_shuffle(x,y)


208 1298


In [2]:
vocab_size = len(vocabulary)
sentence_size = x_train.shape[1]

print('Train/Test split: %d/%d' % (len(y_train), len(y_test)))
print('Train shape:', x_train.shape)
print('Test shape:', x_test.shape)
print('vocab_size', vocab_size)
print('sentence max words', sentence_size)

Train/Test split: 1204/302
Train shape: (1204, 118)
Test shape: (302, 118)
vocab_size 8845
sentence max words 118


In [0]:
'''
We now define the neural architecture of the CNN. The architecture is defined as : (10)

1) Embedding layer that converts the vector representation of the sentence from a one-hot encoding to a fixed sized word embedding
   (mx.sym.Embedding)
   
2) Convolution + activation + max pooling layer 
   (mx.sym.Convolution+ mx.sym.Activation+ mx.sym.Pooling)
   This procedure is to be followed for different sizes of filters (the filters corresponding to size 2 looks at the bigram distribution, 3 looks at trigram etc. 

3) Concat all the filters together (mx.sym.Concat)

4) Pass the results through a fully Connected layer of size 2 and then run softmax on it. 
   (mx.sym.FullyConnected, mx.sym.SoftmaxOutput)
   

We then initialize the intermediate layers of appropriate size and train the model using back prop. (10)
(Look up the mxnet tutorial if you have any doubt)

Run the classifier and for each epoch with a specified batch size observe the accuracy on the training set and test set (5)


Default parameters:

1) No of epochs = 10
2) Batch size = 20
3) Size of word embeddings = 200
4) Size of filters =[2,3,4,5]
5) Filter embedding= 100
6) Optimizer = rmsprop
7) learning rate = 0.005

'''

batch_size = 20

input_x = mx.sym.Variable('data')
input_y = mx.sym.Variable('softmax_label')

num_embed = 200

embed_layer = mx.sym.Embedding(data=input_x, input_dim=vocab_size, output_dim=num_embed, name='vocab_embed')

conv_input = mx.sym.Reshape(data=embed_layer, shape=(batch_size, 1, sentence_size, num_embed))

filter_list=[2,3,4,5]

num_filter=100
pooled_outputs = []

for filter_size in filter_list:
    convi = mx.sym.Convolution(data=conv_input, kernel=(filter_size, num_embed), num_filter=num_filter)
    relui = mx.sym.Activation(data=convi, act_type='relu')
    pooli = mx.sym.Pooling(data=relui, pool_type='max', kernel=(sentence_size - filter_size + 1, 1), stride=(1, 1))
    pooled_outputs.append(pooli)

total_filters = num_filter * len(filter_list)
concat = mx.sym.Concat(*pooled_outputs, dim=1)

h_pool = mx.sym.Reshape(data=concat, shape=(batch_size, total_filters))

# dropout layer
dropout = 0.5

if dropout > 0.0:
    h_drop = mx.sym.Dropout(data=h_pool, p=dropout)
else:
    h_drop = h_pool

num_label = 2

cls_weight = mx.sym.Variable('cls_weight')
cls_bias = mx.sym.Variable('cls_bias')

fc = mx.sym.FullyConnected(data=h_drop, weight=cls_weight, bias=cls_bias, num_hidden=num_label)

sm = mx.sym.SoftmaxOutput(data=fc, label=input_y, name='softmax')

cnn = sm

In [0]:
# Define the structure of our CNN Model (as a named tuple)
CNNModel = namedtuple("CNNModel", ['cnn_exec', 'symbol', 'data', 'label', 'param_blocks'])

# Define what device to train/test on, use GPU if available
ctx = mx.gpu() if mx.test_utils.list_gpus() else mx.cpu()

arg_names = cnn.list_arguments()

input_shapes = {}
input_shapes['data'] = (batch_size, sentence_size)

arg_shape, out_shape, aux_shape = cnn.infer_shape(**input_shapes)
arg_arrays = [mx.nd.zeros(s, ctx) for s in arg_shape]
args_grad = {}
for shape, name in zip(arg_shape, arg_names):
    if name in ['softmax_label', 'data']: # input, output
        continue
    args_grad[name] = mx.nd.zeros(shape, ctx)

cnn_exec = cnn.bind(ctx=ctx, args=arg_arrays, args_grad=args_grad, grad_req='add')

param_blocks = []
arg_dict = dict(zip(arg_names, cnn_exec.arg_arrays))
initializer = mx.initializer.Uniform(0.1)
for i, name in enumerate(arg_names):
    if name in ['softmax_label', 'data']: # input, output
        continue
    initializer(mx.init.InitDesc(name), arg_dict[name])

    param_blocks.append( (i, arg_dict[name], args_grad[name], name) )

data = cnn_exec.arg_dict['data']
label = cnn_exec.arg_dict['softmax_label']

cnn_model= CNNModel(cnn_exec=cnn_exec, symbol=cnn, data=data, label=label, param_blocks=param_blocks)

In [5]:
'''
Train the cnn_model using back prop
'''

optimizer = 'rmsprop'
max_grad_norm = 5.0
learning_rate = 0.005
epoch = 10

# create optimizer
opt = mx.optimizer.create(optimizer)
opt.lr = learning_rate

updater = mx.optimizer.get_updater(opt)

# For each training epoch
for iteration in range(epoch):
    tic = time.time()
    num_correct = 0
    num_total = 0

    # Over each batch of training data
    for begin in range(0, x_train.shape[0], batch_size):
        batchX = x_train[begin:begin+batch_size]
        batchY = y_train[begin:begin+batch_size]
        if batchX.shape[0] != batch_size:
            continue

        cnn_model.data[:] = batchX
        cnn_model.label[:] = batchY

        # forward
        cnn_model.cnn_exec.forward(is_train=True)

        # backward
        cnn_model.cnn_exec.backward()

        # eval on training data
        num_correct += sum(batchY == np.argmax(cnn_model.cnn_exec.outputs[0].asnumpy(), axis=1))
        num_total += len(batchY)

        # update weights
        norm = 0
        for idx, weight, grad, name in cnn_model.param_blocks:
            grad /= batch_size
            l2_norm = mx.nd.norm(grad).asscalar()
            norm += l2_norm * l2_norm

        norm = math.sqrt(norm)
        for idx, weight, grad, name in cnn_model.param_blocks:
            if norm > max_grad_norm:
                grad *= (max_grad_norm / norm)

            updater(idx, grad, weight)

            # reset gradient to zero
            grad[:] = 0.0

    # Decay learning rate for this epoch to ensure we are not "overshooting" optima
    if iteration % 50 == 0 and iteration > 0:
        opt.lr *= 0.5
        print('reset learning rate to %g' % opt.lr)

    # End of training loop for this epoch
    toc = time.time()
    train_time = toc - tic
    train_acc = num_correct * 100 / float(num_total)

    # Saving checkpoint to disk
    if (iteration + 1) % 10 == 0:
        prefix = 'cnn'
        cnn_model.symbol.save('./%s-symbol.json' % prefix)
        save_dict = {('arg:%s' % k) : v  for k, v in cnn_model.cnn_exec.arg_dict.items()}
        save_dict.update({('aux:%s' % k) : v for k, v in cnn_model.cnn_exec.aux_dict.items()})
        param_name = './%s-%04d.params' % (prefix, iteration)
        mx.nd.save(param_name, save_dict)
        print('Saved checkpoint to %s' % param_name)


    # Evaluate model after this epoch on test set
    num_correct = 0
    num_total = 0

    # For each test batch
    for begin in range(0, x_test.shape[0], batch_size):
        batchX = x_test[begin:begin+batch_size]
        batchY = y_test[begin:begin+batch_size]

        if batchX.shape[0] != batch_size:
            continue

        cnn_model.data[:] = batchX
        cnn_model.cnn_exec.forward(is_train=False)

        num_correct += sum(batchY == np.argmax(cnn_model.cnn_exec.outputs[0].asnumpy(), axis=1))
        num_total += len(batchY)

    test_acc = num_correct * 100 / float(num_total)
    print('Iter [%d] Train: Time: %.3fs, Training Accuracy: %.3f \
            --- Test Accuracy thus far: %.3f' % (iteration, train_time, train_acc, test_acc))

Iter [0] Train: Time: 1.130s, Training Accuracy: 86.917             --- Test Accuracy thus far: 88.000
Iter [1] Train: Time: 1.126s, Training Accuracy: 93.667             --- Test Accuracy thus far: 89.000
Iter [2] Train: Time: 1.125s, Training Accuracy: 98.333             --- Test Accuracy thus far: 91.333
Iter [3] Train: Time: 1.123s, Training Accuracy: 99.250             --- Test Accuracy thus far: 91.000
Iter [4] Train: Time: 1.128s, Training Accuracy: 99.500             --- Test Accuracy thus far: 90.667
Iter [5] Train: Time: 1.126s, Training Accuracy: 99.667             --- Test Accuracy thus far: 91.000
Iter [6] Train: Time: 1.135s, Training Accuracy: 99.500             --- Test Accuracy thus far: 90.667
Iter [7] Train: Time: 1.132s, Training Accuracy: 99.667             --- Test Accuracy thus far: 90.333
Iter [8] Train: Time: 1.127s, Training Accuracy: 99.833             --- Test Accuracy thus far: 90.000
Saved checkpoint to ./cnn-0009.params
Iter [9] Train: Time: 1.120s, Train

In [6]:
'''
So far, the assignment has been posed in a manner so that you can refer to directly the mxnet tutorial on the same problem. 

The final 15 marks is meant to carry out experimentations of your own and observe how the results change by experimentation. 

1) Would the results improve if instead of using the word embeddings that is based solely on frequency, if you have been able to incorporate sub-word information
   (In short run fasttext on the corpus and use the word embeddings generated by fastetxt). (8)
   
2) Accuracy might not be the best way to measure the performance of a skewed dataset. What other metrics would you use ? Why? 
   Experiment with different hyper-paramters to show the performance in terms of metric? 
   You can assume that we want to identify all the medically relevant tweets (i.e. tweets with 'yes' class more). (7)
    

Delivearbles:

The ipython notebook with the results to each part of the question. 


P.S: This assignment is part of a research question I am working on my free time. So if you have any insights, I'd love to hear them. 
Happy coding 

Ritam Dutt
14CS30041

'''





"\nSo far, the assignment has been posed in a manner so that you can refer to directly the mxnet tutorial on the same problem. \n\nThe final 15 marks is meant to carry out experimentations of your own and observe how the results change by experimentation. \n\n1) Would the results improve if instead of using the word embeddings that is based solely on frequency, if you have been able to incorporate sub-word information\n   (In short run fasttext on the corpus and use the word embeddings generated by fastetxt). (8)\n   \n2) Accuracy might not be the best way to measure the performance of a skewed dataset. What other metrics would you use ? Why? \n   Experiment with different hyper-paramters to show the performance in terms of metric? \n   You can assume that we want to identify all the medically relevant tweets (i.e. tweets with 'yes' class more). (7)\n    \n\nDelivearbles:\n\nThe ipython notebook with the results to each part of the question. \n\n\nP.S: This assignment is part of a rese


### Results of Parameter Tuning

#### I am getting better results by having Dropout of 0.2 with RMSProp Optimizer with Learning Rate 0.05 and decrease after 5 epochs

With Adam Optimizer there is no much change in accuracy

# Fast Text

In [7]:
from mxnet.contrib import text

def create_word_vectors(sentences):
    '''
    Input: List of sentences
    Output: List of word vectors corresponding to each sentence, vocabulary
    '''
    
    sequence_length = max(len(x) for x in sentences)
    padded_sentences = []
    for i in range(len(sentences)):
        sentence = sentences[i]
        num_padding = sequence_length - len(sentence)
        new_sentence = sentence + ["</s>"] * num_padding
        padded_sentences.append(' '.join(new_sentence))
    
    # Build counter
    counter  = text.utils.count_tokens_from_str('\n'.join(padded_sentences))
    
    # Build Vocabulary
    vocabulary = text.vocab.Vocabulary(counter)
    
    my_embedding = text.embedding.create('fasttext', pretrained_file_name='wiki.simple.vec', vocabulary=vocabulary)
    
    x = np.array([vocabulary.to_indices(sentence.split(' ')) for sentence in padded_sentences])
    
    return x, my_embedding


x, embedding = create_word_vectors(sentences)

def create_shuffle(x,y):
    '''
    Create an equal distribution of the positive and negative examples. 
    Please do not change this particular shuffling method.
    '''
    pos_len= len(pos_data)
    neg_len= len(neg_data)
    pos_len_train= int(0.8*pos_len)
    neg_len_train= int(0.8*neg_len)
    train_data= [(x[i],y[i]) for i in range(0, pos_len_train)]
    train_data.extend([(x[i],y[i]) for i in range(pos_len, pos_len+ neg_len_train )])
    test_data=[(x[i],y[i]) for i in range(pos_len_train, pos_len)]
    test_data.extend([(x[i],y[i]) for i in range(pos_len+ neg_len_train, len(x) )])
    
    random.shuffle(train_data)
    x_train=[i[0] for i in train_data]
    y_train=[i[1] for i in train_data]
    random.shuffle(test_data)
    x_test=[i[0] for i in test_data]
    y_test=[i[1] for i in test_data]
    
    x_train=np.array(x_train)
    y_train=np.array(y_train)
    x_test= np.array(x_test)
    y_test= np.array(y_test)
    return x_train, y_train, x_test, y_test

x_train, y_train, x_test, y_test= create_shuffle(x,y)

  'skipped.' % (line_num, token, elems))


In [8]:
vocab_size = len(embedding)
sentence_size = x_train.shape[1]

print('Train/Test split: %d/%d' % (len(y_train), len(y_test)))
print('Train shape:', x_train.shape)
print('Test shape:', x_test.shape)
print('vocab_size', vocab_size)
print('sentence max words', sentence_size)

Train/Test split: 1204/302
Train shape: (1204, 118)
Test shape: (302, 118)
vocab_size 8846
sentence max words 118


In [0]:
'''
Default parameters:

1) No of epochs = 10
2) Batch size = 20
3) Size of word embeddings = 200
4) Size of filters =[2,3,4,5]
5) Filter embedding= 100
6) Optimizer = rmsprop
7) learning rate = 0.005

'''

batch_size = 20

num_embed = embedding.vec_len

input_x = mx.sym.Variable('data')
input_y = mx.sym.Variable('softmax_label')


embed_layer = mx.sym.Embedding(data=input_x, input_dim=len(embedding), output_dim=embedding.vec_len, name='vocab_embed_fasttext')

# embed_layer.initialize()
# embed_layer.weight.set_data(embedding.idx_to_vec)

conv_input = mx.sym.Reshape(data=embed_layer, shape=(batch_size, 1, sentence_size, num_embed))

filter_list=[2,3,4,5]

num_filter=100
pooled_outputs = []

for filter_size in filter_list:
    convi = mx.sym.Convolution(data=conv_input, kernel=(filter_size, num_embed), num_filter=num_filter)
    relui = mx.sym.Activation(data=convi, act_type='relu')
    pooli = mx.sym.Pooling(data=relui, pool_type='max', kernel=(sentence_size - filter_size + 1, 1), stride=(1, 1))
    pooled_outputs.append(pooli)

total_filters = num_filter * len(filter_list)
concat = mx.sym.Concat(*pooled_outputs, dim=1)

h_pool = mx.sym.Reshape(data=concat, shape=(batch_size, total_filters))

# dropout layer
dropout = 0.5

if dropout > 0.0:
    h_drop = mx.sym.Dropout(data=h_pool, p=dropout)
else:
    h_drop = h_pool

num_label = 2

cls_weight = mx.sym.Variable('cls_weight')
cls_bias = mx.sym.Variable('cls_bias')

fc = mx.sym.FullyConnected(data=h_drop, weight=cls_weight, bias=cls_bias, num_hidden=num_label)

sm = mx.sym.SoftmaxOutput(data=fc, label=input_y, name='softmax')

cnn = sm

In [0]:
# Define the structure of our CNN Model (as a named tuple)
CNNModel = namedtuple("CNNModel", ['cnn_exec', 'symbol', 'data', 'label', 'param_blocks'])

# Define what device to train/test on, use GPU if available
ctx = mx.gpu() if mx.test_utils.list_gpus() else mx.cpu()

arg_names = cnn.list_arguments()

input_shapes = {}
input_shapes['data'] = (batch_size, sentence_size)

arg_shape, out_shape, aux_shape = cnn.infer_shape(**input_shapes)
arg_arrays = [mx.nd.zeros(s, ctx) for s in arg_shape]
args_grad = {}
for shape, name in zip(arg_shape, arg_names):
    if name in ['softmax_label', 'data']: # input, output
        continue
    args_grad[name] = mx.nd.zeros(shape, ctx)

cnn_exec = cnn.bind(ctx=ctx, args=arg_arrays, args_grad=args_grad, grad_req='add')

param_blocks = []
arg_dict = dict(zip(arg_names, cnn_exec.arg_arrays))
initializer = mx.initializer.Uniform(0.1)
for i, name in enumerate(arg_names):
    if name in ['softmax_label', 'data']: # input, output
        continue
    if name == 'vocab_embed_fasttext_weight':
        initializer(mx.init.InitDesc(name), embedding.idx_to_vec)
        continue
    initializer(mx.init.InitDesc(name), arg_dict[name])

    param_blocks.append( (i, arg_dict[name], args_grad[name], name) )

data = cnn_exec.arg_dict['data']
label = cnn_exec.arg_dict['softmax_label']

cnn_model= CNNModel(cnn_exec=cnn_exec, symbol=cnn, data=data, label=label, param_blocks=param_blocks)

In [11]:
'''
Train the cnn_model using back prop
'''

optimizer = 'rmsprop'
max_grad_norm = 5.0
learning_rate = 0.0005
epoch = 10

# create optimizer
opt = mx.optimizer.create(optimizer)
opt.lr = learning_rate

updater = mx.optimizer.get_updater(opt)

# For each training epoch
for iteration in range(epoch):
    tic = time.time()
    num_correct = 0
    num_total = 0

    # Over each batch of training data
    for begin in range(0, x_train.shape[0], batch_size):
        batchX = x_train[begin:begin+batch_size]
        batchY = y_train[begin:begin+batch_size]
        if batchX.shape[0] != batch_size:
            continue

        cnn_model.data[:] = batchX
        cnn_model.label[:] = batchY

        # forward
        cnn_model.cnn_exec.forward(is_train=True)

        # backward
        cnn_model.cnn_exec.backward()

        # eval on training data
        num_correct += sum(batchY == np.argmax(cnn_model.cnn_exec.outputs[0].asnumpy(), axis=1))
        num_total += len(batchY)

        # update weights
        norm = 0
        for idx, weight, grad, name in cnn_model.param_blocks:
            grad /= batch_size
            l2_norm = mx.nd.norm(grad).asscalar()
            norm += l2_norm * l2_norm

        norm = math.sqrt(norm)
        for idx, weight, grad, name in cnn_model.param_blocks:
            if norm > max_grad_norm:
                grad *= (max_grad_norm / norm)

            updater(idx, grad, weight)

            # reset gradient to zero
            grad[:] = 0.0

    # Decay learning rate for this epoch to ensure we are not "overshooting" optima
    if iteration % 50 == 0 and iteration > 0:
        opt.lr *= 0.5
        print('reset learning rate to %g' % opt.lr)

    # End of training loop for this epoch
    toc = time.time()
    train_time = toc - tic
    train_acc = num_correct * 100 / float(num_total)

    # Saving checkpoint to disk
    if (iteration + 1) % 10 == 0:
        prefix = 'cnn'
        cnn_model.symbol.save('./%s-symbol.json' % prefix)
        save_dict = {('arg:%s' % k) : v  for k, v in cnn_model.cnn_exec.arg_dict.items()}
        save_dict.update({('aux:%s' % k) : v for k, v in cnn_model.cnn_exec.aux_dict.items()})
        param_name = './%s-%04d.params' % (prefix, iteration)
        mx.nd.save(param_name, save_dict)
        print('Saved checkpoint to %s' % param_name)


    # Evaluate model after this epoch on test set
    num_correct = 0
    num_total = 0

    # For each test batch
    for begin in range(0, x_test.shape[0], batch_size):
        batchX = x_test[begin:begin+batch_size]
        batchY = y_test[begin:begin+batch_size]

        if batchX.shape[0] != batch_size:
            continue

        cnn_model.data[:] = batchX
        cnn_model.cnn_exec.forward(is_train=False)

        num_correct += sum(batchY == np.argmax(cnn_model.cnn_exec.outputs[0].asnumpy(), axis=1))
        num_total += len(batchY)

    test_acc = num_correct * 100 / float(num_total)
    print('Iter [%d] Train: Time: %.3fs, Training Accuracy: %.3f \
            --- Test Accuracy thus far: %.3f' % (iteration, train_time, train_acc, test_acc))

Iter [0] Train: Time: 1.263s, Training Accuracy: 86.167             --- Test Accuracy thus far: 86.000
Iter [1] Train: Time: 1.123s, Training Accuracy: 86.167             --- Test Accuracy thus far: 86.000
Iter [2] Train: Time: 1.122s, Training Accuracy: 86.167             --- Test Accuracy thus far: 86.000
Iter [3] Train: Time: 1.110s, Training Accuracy: 86.167             --- Test Accuracy thus far: 86.000
Iter [4] Train: Time: 1.115s, Training Accuracy: 86.167             --- Test Accuracy thus far: 86.000
Iter [5] Train: Time: 1.117s, Training Accuracy: 86.167             --- Test Accuracy thus far: 86.000
Iter [6] Train: Time: 1.113s, Training Accuracy: 86.167             --- Test Accuracy thus far: 86.000
Iter [7] Train: Time: 1.112s, Training Accuracy: 86.167             --- Test Accuracy thus far: 86.000
Iter [8] Train: Time: 1.119s, Training Accuracy: 86.167             --- Test Accuracy thus far: 86.000
Saved checkpoint to ./cnn-0009.params
Iter [9] Train: Time: 1.114s, Train

# Parameter Tuning

In [0]:
def create_word_vectors(sentences):
    '''
    Input: List of sentences
    Output: List of word vectors corresponding to each sentence, vocabulary
    '''
    
    sequence_length = max(len(x) for x in sentences)
    padded_sentences = []
    for i in range(len(sentences)):
        sentence = sentences[i]
        num_padding = sequence_length - len(sentence)
        new_sentence = sentence + ["</s>"] * num_padding
        padded_sentences.append(new_sentence)
    
    # Build vocabulary
    word_counts = Counter(itertools.chain(*padded_sentences))
    
    # Mapping from index to word
    vocabulary_inv = [x[0] for x in word_counts.most_common()]
    
    # Mapping from word to index
    vocabulary = {x: i for i, x in enumerate(vocabulary_inv)}
    
    x = np.array([
            [vocabulary[word] for word in sentence]
            for sentence in padded_sentences])
    
    return x, vocabulary


x, vocabulary = create_word_vectors(sentences)


def create_shuffle(x,y):
    '''
    Create an equal distribution of the positive and negative examples. 
    Please do not change this particular shuffling method.
    '''
    pos_len= len(pos_data)
    neg_len= len(neg_data)
    pos_len_train= int(0.8*pos_len)
    neg_len_train= int(0.8*neg_len)
    train_data= [(x[i],y[i]) for i in range(0, pos_len_train)]
    train_data.extend([(x[i],y[i]) for i in range(pos_len, pos_len+ neg_len_train )])
    test_data=[(x[i],y[i]) for i in range(pos_len_train, pos_len)]
    test_data.extend([(x[i],y[i]) for i in range(pos_len+ neg_len_train, len(x) )])
    
    random.shuffle(train_data)
    x_train=[i[0] for i in train_data]
    y_train=[i[1] for i in train_data]
    random.shuffle(test_data)
    x_test=[i[0] for i in test_data]
    y_test=[i[1] for i in test_data]
    
    x_train=np.array(x_train)
    y_train=np.array(y_train)
    x_test= np.array(x_test)
    y_test= np.array(y_test)
    return x_train, y_train, x_test, y_test

x_train, y_train, x_test, y_test= create_shuffle(x,y)


## No Dropout

In [13]:
batch_size = 20

input_x = mx.sym.Variable('data')
input_y = mx.sym.Variable('softmax_label')

num_embed = 200

embed_layer = mx.sym.Embedding(data=input_x, input_dim=vocab_size, output_dim=num_embed, name='vocab_embed')

conv_input = mx.sym.Reshape(data=embed_layer, shape=(batch_size, 1, sentence_size, num_embed))

filter_list=[2,3,4,5]

num_filter=100
pooled_outputs = []

for filter_size in filter_list:
    convi = mx.sym.Convolution(data=conv_input, kernel=(filter_size, num_embed), num_filter=num_filter)
    relui = mx.sym.Activation(data=convi, act_type='relu')
    pooli = mx.sym.Pooling(data=relui, pool_type='max', kernel=(sentence_size - filter_size + 1, 1), stride=(1, 1))
    pooled_outputs.append(pooli)

total_filters = num_filter * len(filter_list)
concat = mx.sym.Concat(*pooled_outputs, dim=1)

h_pool = mx.sym.Reshape(data=concat, shape=(batch_size, total_filters))

# dropout layer
dropout = 0

if dropout > 0.0:
    h_drop = mx.sym.Dropout(data=h_pool, p=dropout)
else:
    h_drop = h_pool

num_label = 2

cls_weight = mx.sym.Variable('cls_weight')
cls_bias = mx.sym.Variable('cls_bias')

fc = mx.sym.FullyConnected(data=h_drop, weight=cls_weight, bias=cls_bias, num_hidden=num_label)

sm = mx.sym.SoftmaxOutput(data=fc, label=input_y, name='softmax')

cnn = sm

################################################################################################################

# Define the structure of our CNN Model (as a named tuple)
CNNModel = namedtuple("CNNModel", ['cnn_exec', 'symbol', 'data', 'label', 'param_blocks'])

# Define what device to train/test on, use GPU if available
ctx = mx.gpu() if mx.test_utils.list_gpus() else mx.cpu()

arg_names = cnn.list_arguments()

input_shapes = {}
input_shapes['data'] = (batch_size, sentence_size)

arg_shape, out_shape, aux_shape = cnn.infer_shape(**input_shapes)
arg_arrays = [mx.nd.zeros(s, ctx) for s in arg_shape]
args_grad = {}
for shape, name in zip(arg_shape, arg_names):
    if name in ['softmax_label', 'data']: # input, output
        continue
    args_grad[name] = mx.nd.zeros(shape, ctx)

cnn_exec = cnn.bind(ctx=ctx, args=arg_arrays, args_grad=args_grad, grad_req='add')

param_blocks = []
arg_dict = dict(zip(arg_names, cnn_exec.arg_arrays))
initializer = mx.initializer.Uniform(0.1)
for i, name in enumerate(arg_names):
    if name in ['softmax_label', 'data']: # input, output
        continue
    initializer(mx.init.InitDesc(name), arg_dict[name])

    param_blocks.append( (i, arg_dict[name], args_grad[name], name) )

data = cnn_exec.arg_dict['data']
label = cnn_exec.arg_dict['softmax_label']

cnn_model= CNNModel(cnn_exec=cnn_exec, symbol=cnn, data=data, label=label, param_blocks=param_blocks)

########################################################################################################################

'''
Train the cnn_model using back prop
'''

optimizer = 'rmsprop'
max_grad_norm = 5.0
learning_rate = 0.005
epoch = 10

# create optimizer
opt = mx.optimizer.create(optimizer)
opt.lr = learning_rate

updater = mx.optimizer.get_updater(opt)

# For each training epoch
for iteration in range(epoch):
    tic = time.time()
    num_correct = 0
    num_total = 0

    # Over each batch of training data
    for begin in range(0, x_train.shape[0], batch_size):
        batchX = x_train[begin:begin+batch_size]
        batchY = y_train[begin:begin+batch_size]
        if batchX.shape[0] != batch_size:
            continue

        cnn_model.data[:] = batchX
        cnn_model.label[:] = batchY

        # forward
        cnn_model.cnn_exec.forward(is_train=True)

        # backward
        cnn_model.cnn_exec.backward()

        # eval on training data
        num_correct += sum(batchY == np.argmax(cnn_model.cnn_exec.outputs[0].asnumpy(), axis=1))
        num_total += len(batchY)

        # update weights
        norm = 0
        for idx, weight, grad, name in cnn_model.param_blocks:
            grad /= batch_size
            l2_norm = mx.nd.norm(grad).asscalar()
            norm += l2_norm * l2_norm

        norm = math.sqrt(norm)
        for idx, weight, grad, name in cnn_model.param_blocks:
            if norm > max_grad_norm:
                grad *= (max_grad_norm / norm)

            updater(idx, grad, weight)

            # reset gradient to zero
            grad[:] = 0.0

    # Decay learning rate for this epoch to ensure we are not "overshooting" optima
    if iteration % 50 == 0 and iteration > 0:
        opt.lr *= 0.5
        print('reset learning rate to %g' % opt.lr)

    # End of training loop for this epoch
    toc = time.time()
    train_time = toc - tic
    train_acc = num_correct * 100 / float(num_total)

    # Saving checkpoint to disk
    if (iteration + 1) % 10 == 0:
        prefix = 'cnn'
        cnn_model.symbol.save('./%s-symbol.json' % prefix)
        save_dict = {('arg:%s' % k) : v  for k, v in cnn_model.cnn_exec.arg_dict.items()}
        save_dict.update({('aux:%s' % k) : v for k, v in cnn_model.cnn_exec.aux_dict.items()})
        param_name = './%s-%04d.params' % (prefix, iteration)
        mx.nd.save(param_name, save_dict)
        print('Saved checkpoint to %s' % param_name)


    # Evaluate model after this epoch on test set
    num_correct = 0
    num_total = 0

    # For each test batch
    for begin in range(0, x_test.shape[0], batch_size):
        batchX = x_test[begin:begin+batch_size]
        batchY = y_test[begin:begin+batch_size]

        if batchX.shape[0] != batch_size:
            continue

        cnn_model.data[:] = batchX
        cnn_model.cnn_exec.forward(is_train=False)

        num_correct += sum(batchY == np.argmax(cnn_model.cnn_exec.outputs[0].asnumpy(), axis=1))
        num_total += len(batchY)

    test_acc = num_correct * 100 / float(num_total)
    print('Iter [%d] Train: Time: %.3fs, Training Accuracy: %.3f \
            --- Test Accuracy thus far: %.3f' % (iteration, train_time, train_acc, test_acc))

Iter [0] Train: Time: 1.118s, Training Accuracy: 86.750             --- Test Accuracy thus far: 88.667
Iter [1] Train: Time: 1.123s, Training Accuracy: 97.083             --- Test Accuracy thus far: 84.333
Iter [2] Train: Time: 1.114s, Training Accuracy: 98.500             --- Test Accuracy thus far: 91.667
Iter [3] Train: Time: 1.117s, Training Accuracy: 99.500             --- Test Accuracy thus far: 91.667
Iter [4] Train: Time: 1.119s, Training Accuracy: 99.583             --- Test Accuracy thus far: 90.000
Iter [5] Train: Time: 1.117s, Training Accuracy: 99.667             --- Test Accuracy thus far: 90.333
Iter [6] Train: Time: 1.119s, Training Accuracy: 99.833             --- Test Accuracy thus far: 90.000
Iter [7] Train: Time: 1.119s, Training Accuracy: 99.833             --- Test Accuracy thus far: 90.000
Iter [8] Train: Time: 1.127s, Training Accuracy: 99.833             --- Test Accuracy thus far: 90.333
Saved checkpoint to ./cnn-0009.params
Iter [9] Train: Time: 1.114s, Train

## Adam Optimizer

In [14]:
batch_size = 20

input_x = mx.sym.Variable('data')
input_y = mx.sym.Variable('softmax_label')

num_embed = 200

embed_layer = mx.sym.Embedding(data=input_x, input_dim=vocab_size, output_dim=num_embed, name='vocab_embed')

conv_input = mx.sym.Reshape(data=embed_layer, shape=(batch_size, 1, sentence_size, num_embed))

filter_list=[2,3,4,5]

num_filter=100
pooled_outputs = []

for filter_size in filter_list:
    convi = mx.sym.Convolution(data=conv_input, kernel=(filter_size, num_embed), num_filter=num_filter)
    relui = mx.sym.Activation(data=convi, act_type='relu')
    pooli = mx.sym.Pooling(data=relui, pool_type='max', kernel=(sentence_size - filter_size + 1, 1), stride=(1, 1))
    pooled_outputs.append(pooli)

total_filters = num_filter * len(filter_list)
concat = mx.sym.Concat(*pooled_outputs, dim=1)

h_pool = mx.sym.Reshape(data=concat, shape=(batch_size, total_filters))

# dropout layer
dropout = 0

if dropout > 0.0:
    h_drop = mx.sym.Dropout(data=h_pool, p=dropout)
else:
    h_drop = h_pool

num_label = 2

cls_weight = mx.sym.Variable('cls_weight')
cls_bias = mx.sym.Variable('cls_bias')

fc = mx.sym.FullyConnected(data=h_drop, weight=cls_weight, bias=cls_bias, num_hidden=num_label)

sm = mx.sym.SoftmaxOutput(data=fc, label=input_y, name='softmax')

cnn = sm

################################################################################################################

# Define the structure of our CNN Model (as a named tuple)
CNNModel = namedtuple("CNNModel", ['cnn_exec', 'symbol', 'data', 'label', 'param_blocks'])

# Define what device to train/test on, use GPU if available
ctx = mx.gpu() if mx.test_utils.list_gpus() else mx.cpu()

arg_names = cnn.list_arguments()

input_shapes = {}
input_shapes['data'] = (batch_size, sentence_size)

arg_shape, out_shape, aux_shape = cnn.infer_shape(**input_shapes)
arg_arrays = [mx.nd.zeros(s, ctx) for s in arg_shape]
args_grad = {}
for shape, name in zip(arg_shape, arg_names):
    if name in ['softmax_label', 'data']: # input, output
        continue
    args_grad[name] = mx.nd.zeros(shape, ctx)

cnn_exec = cnn.bind(ctx=ctx, args=arg_arrays, args_grad=args_grad, grad_req='add')

param_blocks = []
arg_dict = dict(zip(arg_names, cnn_exec.arg_arrays))
initializer = mx.initializer.Uniform(0.1)
for i, name in enumerate(arg_names):
    if name in ['softmax_label', 'data']: # input, output
        continue
    initializer(mx.init.InitDesc(name), arg_dict[name])

    param_blocks.append( (i, arg_dict[name], args_grad[name], name) )

data = cnn_exec.arg_dict['data']
label = cnn_exec.arg_dict['softmax_label']

cnn_model= CNNModel(cnn_exec=cnn_exec, symbol=cnn, data=data, label=label, param_blocks=param_blocks)

########################################################################################################################

'''
Train the cnn_model using back prop
'''

optimizer = 'adam'
max_grad_norm = 5.0
learning_rate = 0.005
epoch = 10

# create optimizer
opt = mx.optimizer.create(optimizer)
opt.lr = learning_rate

updater = mx.optimizer.get_updater(opt)

# For each training epoch
for iteration in range(epoch):
    tic = time.time()
    num_correct = 0
    num_total = 0

    # Over each batch of training data
    for begin in range(0, x_train.shape[0], batch_size):
        batchX = x_train[begin:begin+batch_size]
        batchY = y_train[begin:begin+batch_size]
        if batchX.shape[0] != batch_size:
            continue

        cnn_model.data[:] = batchX
        cnn_model.label[:] = batchY

        # forward
        cnn_model.cnn_exec.forward(is_train=True)

        # backward
        cnn_model.cnn_exec.backward()

        # eval on training data
        num_correct += sum(batchY == np.argmax(cnn_model.cnn_exec.outputs[0].asnumpy(), axis=1))
        num_total += len(batchY)

        # update weights
        norm = 0
        for idx, weight, grad, name in cnn_model.param_blocks:
            grad /= batch_size
            l2_norm = mx.nd.norm(grad).asscalar()
            norm += l2_norm * l2_norm

        norm = math.sqrt(norm)
        for idx, weight, grad, name in cnn_model.param_blocks:
            if norm > max_grad_norm:
                grad *= (max_grad_norm / norm)

            updater(idx, grad, weight)

            # reset gradient to zero
            grad[:] = 0.0

    # Decay learning rate for this epoch to ensure we are not "overshooting" optima
    if iteration % 50 == 0 and iteration > 0:
        opt.lr *= 0.5
        print('reset learning rate to %g' % opt.lr)

    # End of training loop for this epoch
    toc = time.time()
    train_time = toc - tic
    train_acc = num_correct * 100 / float(num_total)

    # Saving checkpoint to disk
    if (iteration + 1) % 10 == 0:
        prefix = 'cnn'
        cnn_model.symbol.save('./%s-symbol.json' % prefix)
        save_dict = {('arg:%s' % k) : v  for k, v in cnn_model.cnn_exec.arg_dict.items()}
        save_dict.update({('aux:%s' % k) : v for k, v in cnn_model.cnn_exec.aux_dict.items()})
        param_name = './%s-%04d.params' % (prefix, iteration)
        mx.nd.save(param_name, save_dict)
        print('Saved checkpoint to %s' % param_name)


    # Evaluate model after this epoch on test set
    num_correct = 0
    num_total = 0

    # For each test batch
    for begin in range(0, x_test.shape[0], batch_size):
        batchX = x_test[begin:begin+batch_size]
        batchY = y_test[begin:begin+batch_size]

        if batchX.shape[0] != batch_size:
            continue

        cnn_model.data[:] = batchX
        cnn_model.cnn_exec.forward(is_train=False)

        num_correct += sum(batchY == np.argmax(cnn_model.cnn_exec.outputs[0].asnumpy(), axis=1))
        num_total += len(batchY)

    test_acc = num_correct * 100 / float(num_total)
    print('Iter [%d] Train: Time: %.3fs, Training Accuracy: %.3f \
            --- Test Accuracy thus far: %.3f' % (iteration, train_time, train_acc, test_acc))

Iter [0] Train: Time: 1.143s, Training Accuracy: 87.500             --- Test Accuracy thus far: 86.333
Iter [1] Train: Time: 1.136s, Training Accuracy: 97.083             --- Test Accuracy thus far: 90.667
Iter [2] Train: Time: 1.130s, Training Accuracy: 99.167             --- Test Accuracy thus far: 91.667
Iter [3] Train: Time: 1.133s, Training Accuracy: 99.500             --- Test Accuracy thus far: 90.333
Iter [4] Train: Time: 1.128s, Training Accuracy: 99.750             --- Test Accuracy thus far: 91.000
Iter [5] Train: Time: 1.131s, Training Accuracy: 99.833             --- Test Accuracy thus far: 91.667
Iter [6] Train: Time: 1.127s, Training Accuracy: 99.833             --- Test Accuracy thus far: 92.000
Iter [7] Train: Time: 1.127s, Training Accuracy: 99.833             --- Test Accuracy thus far: 91.667
Iter [8] Train: Time: 1.142s, Training Accuracy: 99.833             --- Test Accuracy thus far: 91.667
Saved checkpoint to ./cnn-0009.params
Iter [9] Train: Time: 1.129s, Train

## batch size 32

In [15]:
batch_size = 32

input_x = mx.sym.Variable('data')
input_y = mx.sym.Variable('softmax_label')

num_embed = 200

embed_layer = mx.sym.Embedding(data=input_x, input_dim=vocab_size, output_dim=num_embed, name='vocab_embed')

conv_input = mx.sym.Reshape(data=embed_layer, shape=(batch_size, 1, sentence_size, num_embed))

filter_list=[2,3,4,5]

num_filter=100
pooled_outputs = []

for filter_size in filter_list:
    convi = mx.sym.Convolution(data=conv_input, kernel=(filter_size, num_embed), num_filter=num_filter)
    relui = mx.sym.Activation(data=convi, act_type='relu')
    pooli = mx.sym.Pooling(data=relui, pool_type='max', kernel=(sentence_size - filter_size + 1, 1), stride=(1, 1))
    pooled_outputs.append(pooli)

total_filters = num_filter * len(filter_list)
concat = mx.sym.Concat(*pooled_outputs, dim=1)

h_pool = mx.sym.Reshape(data=concat, shape=(batch_size, total_filters))

# dropout layer
dropout = 0

if dropout > 0.0:
    h_drop = mx.sym.Dropout(data=h_pool, p=dropout)
else:
    h_drop = h_pool

num_label = 2

cls_weight = mx.sym.Variable('cls_weight')
cls_bias = mx.sym.Variable('cls_bias')

fc = mx.sym.FullyConnected(data=h_drop, weight=cls_weight, bias=cls_bias, num_hidden=num_label)

sm = mx.sym.SoftmaxOutput(data=fc, label=input_y, name='softmax')

cnn = sm

################################################################################################################

# Define the structure of our CNN Model (as a named tuple)
CNNModel = namedtuple("CNNModel", ['cnn_exec', 'symbol', 'data', 'label', 'param_blocks'])

# Define what device to train/test on, use GPU if available
ctx = mx.gpu() if mx.test_utils.list_gpus() else mx.cpu()

arg_names = cnn.list_arguments()

input_shapes = {}
input_shapes['data'] = (batch_size, sentence_size)

arg_shape, out_shape, aux_shape = cnn.infer_shape(**input_shapes)
arg_arrays = [mx.nd.zeros(s, ctx) for s in arg_shape]
args_grad = {}
for shape, name in zip(arg_shape, arg_names):
    if name in ['softmax_label', 'data']: # input, output
        continue
    args_grad[name] = mx.nd.zeros(shape, ctx)

cnn_exec = cnn.bind(ctx=ctx, args=arg_arrays, args_grad=args_grad, grad_req='add')

param_blocks = []
arg_dict = dict(zip(arg_names, cnn_exec.arg_arrays))
initializer = mx.initializer.Uniform(0.1)
for i, name in enumerate(arg_names):
    if name in ['softmax_label', 'data']: # input, output
        continue
    initializer(mx.init.InitDesc(name), arg_dict[name])

    param_blocks.append( (i, arg_dict[name], args_grad[name], name) )

data = cnn_exec.arg_dict['data']
label = cnn_exec.arg_dict['softmax_label']

cnn_model= CNNModel(cnn_exec=cnn_exec, symbol=cnn, data=data, label=label, param_blocks=param_blocks)

########################################################################################################################

'''
Train the cnn_model using back prop
'''

optimizer = 'rmsprop'
max_grad_norm = 5.0
learning_rate = 0.005
epoch = 15

# create optimizer
opt = mx.optimizer.create(optimizer)
opt.lr = learning_rate

updater = mx.optimizer.get_updater(opt)

# For each training epoch
for iteration in range(epoch):
    tic = time.time()
    num_correct = 0
    num_total = 0

    # Over each batch of training data
    for begin in range(0, x_train.shape[0], batch_size):
        batchX = x_train[begin:begin+batch_size]
        batchY = y_train[begin:begin+batch_size]
        if batchX.shape[0] != batch_size:
            continue

        cnn_model.data[:] = batchX
        cnn_model.label[:] = batchY

        # forward
        cnn_model.cnn_exec.forward(is_train=True)

        # backward
        cnn_model.cnn_exec.backward()

        # eval on training data
        num_correct += sum(batchY == np.argmax(cnn_model.cnn_exec.outputs[0].asnumpy(), axis=1))
        num_total += len(batchY)

        # update weights
        norm = 0
        for idx, weight, grad, name in cnn_model.param_blocks:
            grad /= batch_size
            l2_norm = mx.nd.norm(grad).asscalar()
            norm += l2_norm * l2_norm

        norm = math.sqrt(norm)
        for idx, weight, grad, name in cnn_model.param_blocks:
            if norm > max_grad_norm:
                grad *= (max_grad_norm / norm)

            updater(idx, grad, weight)

            # reset gradient to zero
            grad[:] = 0.0

    # Decay learning rate for this epoch to ensure we are not "overshooting" optima
    if iteration % 50 == 0 and iteration > 0:
        opt.lr *= 0.5
        print('reset learning rate to %g' % opt.lr)

    # End of training loop for this epoch
    toc = time.time()
    train_time = toc - tic
    train_acc = num_correct * 100 / float(num_total)

    # Saving checkpoint to disk
    if (iteration + 1) % 10 == 0:
        prefix = 'cnn'
        cnn_model.symbol.save('./%s-symbol.json' % prefix)
        save_dict = {('arg:%s' % k) : v  for k, v in cnn_model.cnn_exec.arg_dict.items()}
        save_dict.update({('aux:%s' % k) : v for k, v in cnn_model.cnn_exec.aux_dict.items()})
        param_name = './%s-%04d.params' % (prefix, iteration)
        mx.nd.save(param_name, save_dict)
        print('Saved checkpoint to %s' % param_name)


    # Evaluate model after this epoch on test set
    num_correct = 0
    num_total = 0

    # For each test batch
    for begin in range(0, x_test.shape[0], batch_size):
        batchX = x_test[begin:begin+batch_size]
        batchY = y_test[begin:begin+batch_size]

        if batchX.shape[0] != batch_size:
            continue

        cnn_model.data[:] = batchX
        cnn_model.cnn_exec.forward(is_train=False)

        num_correct += sum(batchY == np.argmax(cnn_model.cnn_exec.outputs[0].asnumpy(), axis=1))
        num_total += len(batchY)

    test_acc = num_correct * 100 / float(num_total)
    print('Iter [%d] Train: Time: %.3fs, Training Accuracy: %.3f \
            --- Test Accuracy thus far: %.3f' % (iteration, train_time, train_acc, test_acc))

Iter [0] Train: Time: 0.889s, Training Accuracy: 86.655             --- Test Accuracy thus far: 87.500
Iter [1] Train: Time: 0.891s, Training Accuracy: 96.791             --- Test Accuracy thus far: 89.931
Iter [2] Train: Time: 0.882s, Training Accuracy: 98.902             --- Test Accuracy thus far: 92.014
Iter [3] Train: Time: 0.886s, Training Accuracy: 99.493             --- Test Accuracy thus far: 91.319
Iter [4] Train: Time: 0.888s, Training Accuracy: 99.831             --- Test Accuracy thus far: 91.667
Iter [5] Train: Time: 0.887s, Training Accuracy: 99.831             --- Test Accuracy thus far: 91.667
Iter [6] Train: Time: 0.898s, Training Accuracy: 99.831             --- Test Accuracy thus far: 91.319
Iter [7] Train: Time: 0.902s, Training Accuracy: 99.831             --- Test Accuracy thus far: 90.625
Iter [8] Train: Time: 0.902s, Training Accuracy: 99.831             --- Test Accuracy thus far: 90.625
Saved checkpoint to ./cnn-0009.params
Iter [9] Train: Time: 0.901s, Train

## droput 0.2

In [16]:
batch_size = 20

input_x = mx.sym.Variable('data')
input_y = mx.sym.Variable('softmax_label')

num_embed = 200

embed_layer = mx.sym.Embedding(data=input_x, input_dim=vocab_size, output_dim=num_embed, name='vocab_embed')

conv_input = mx.sym.Reshape(data=embed_layer, shape=(batch_size, 1, sentence_size, num_embed))

filter_list=[2,3,4,5]

num_filter=100
pooled_outputs = []

for filter_size in filter_list:
    convi = mx.sym.Convolution(data=conv_input, kernel=(filter_size, num_embed), num_filter=num_filter)
    relui = mx.sym.Activation(data=convi, act_type='relu')
    pooli = mx.sym.Pooling(data=relui, pool_type='max', kernel=(sentence_size - filter_size + 1, 1), stride=(1, 1))
    pooled_outputs.append(pooli)

total_filters = num_filter * len(filter_list)
concat = mx.sym.Concat(*pooled_outputs, dim=1)

h_pool = mx.sym.Reshape(data=concat, shape=(batch_size, total_filters))

# dropout layer
dropout = 0.2

if dropout > 0.0:
    h_drop = mx.sym.Dropout(data=h_pool, p=dropout)
else:
    h_drop = h_pool

num_label = 2

cls_weight = mx.sym.Variable('cls_weight')
cls_bias = mx.sym.Variable('cls_bias')

fc = mx.sym.FullyConnected(data=h_drop, weight=cls_weight, bias=cls_bias, num_hidden=num_label)

sm = mx.sym.SoftmaxOutput(data=fc, label=input_y, name='softmax')

cnn = sm

################################################################################################################

# Define the structure of our CNN Model (as a named tuple)
CNNModel = namedtuple("CNNModel", ['cnn_exec', 'symbol', 'data', 'label', 'param_blocks'])

# Define what device to train/test on, use GPU if available
ctx = mx.gpu() if mx.test_utils.list_gpus() else mx.cpu()

arg_names = cnn.list_arguments()

input_shapes = {}
input_shapes['data'] = (batch_size, sentence_size)

arg_shape, out_shape, aux_shape = cnn.infer_shape(**input_shapes)
arg_arrays = [mx.nd.zeros(s, ctx) for s in arg_shape]
args_grad = {}
for shape, name in zip(arg_shape, arg_names):
    if name in ['softmax_label', 'data']: # input, output
        continue
    args_grad[name] = mx.nd.zeros(shape, ctx)

cnn_exec = cnn.bind(ctx=ctx, args=arg_arrays, args_grad=args_grad, grad_req='add')

param_blocks = []
arg_dict = dict(zip(arg_names, cnn_exec.arg_arrays))
initializer = mx.initializer.Uniform(0.1)
for i, name in enumerate(arg_names):
    if name in ['softmax_label', 'data']: # input, output
        continue
    initializer(mx.init.InitDesc(name), arg_dict[name])

    param_blocks.append( (i, arg_dict[name], args_grad[name], name) )

data = cnn_exec.arg_dict['data']
label = cnn_exec.arg_dict['softmax_label']

cnn_model= CNNModel(cnn_exec=cnn_exec, symbol=cnn, data=data, label=label, param_blocks=param_blocks)

########################################################################################################################

'''
Train the cnn_model using back prop
'''

optimizer = 'rmsprop'
max_grad_norm = 50.0
learning_rate = 0.005
epoch = 10

# create optimizer
opt = mx.optimizer.create(optimizer)
opt.lr = learning_rate

updater = mx.optimizer.get_updater(opt)

# For each training epoch
for iteration in range(epoch):
    tic = time.time()
    num_correct = 0
    num_total = 0

    # Over each batch of training data
    for begin in range(0, x_train.shape[0], batch_size):
        batchX = x_train[begin:begin+batch_size]
        batchY = y_train[begin:begin+batch_size]
        if batchX.shape[0] != batch_size:
            continue

        cnn_model.data[:] = batchX
        cnn_model.label[:] = batchY

        # forward
        cnn_model.cnn_exec.forward(is_train=True)

        # backward
        cnn_model.cnn_exec.backward()

        # eval on training data
        num_correct += sum(batchY == np.argmax(cnn_model.cnn_exec.outputs[0].asnumpy(), axis=1))
        num_total += len(batchY)

        # update weights
        norm = 0
        for idx, weight, grad, name in cnn_model.param_blocks:
            grad /= batch_size
            l2_norm = mx.nd.norm(grad).asscalar()
            norm += l2_norm * l2_norm

        norm = math.sqrt(norm)
        for idx, weight, grad, name in cnn_model.param_blocks:
            if norm > max_grad_norm:
                grad *= (max_grad_norm / norm)

            updater(idx, grad, weight)

            # reset gradient to zero
            grad[:] = 0.0

    # Decay learning rate for this epoch to ensure we are not "overshooting" optima
    if iteration % 5 == 0 and iteration > 0:
        opt.lr *= 0.5
        print('reset learning rate to %g' % opt.lr)

    # End of training loop for this epoch
    toc = time.time()
    train_time = toc - tic
    train_acc = num_correct * 100 / float(num_total)

    # Saving checkpoint to disk
    if (iteration + 1) % 10 == 0:
        prefix = 'cnn'
        cnn_model.symbol.save('./%s-symbol.json' % prefix)
        save_dict = {('arg:%s' % k) : v  for k, v in cnn_model.cnn_exec.arg_dict.items()}
        save_dict.update({('aux:%s' % k) : v for k, v in cnn_model.cnn_exec.aux_dict.items()})
        param_name = './%s-%04d.params' % (prefix, iteration)
        mx.nd.save(param_name, save_dict)
        print('Saved checkpoint to %s' % param_name)


    # Evaluate model after this epoch on test set
    num_correct = 0
    num_total = 0

    # For each test batch
    for begin in range(0, x_test.shape[0], batch_size):
        batchX = x_test[begin:begin+batch_size]
        batchY = y_test[begin:begin+batch_size]

        if batchX.shape[0] != batch_size:
            continue

        cnn_model.data[:] = batchX
        cnn_model.cnn_exec.forward(is_train=False)

        num_correct += sum(batchY == np.argmax(cnn_model.cnn_exec.outputs[0].asnumpy(), axis=1))
        num_total += len(batchY)

    test_acc = num_correct * 100 / float(num_total)
    print('Iter [%d] Train: Time: %.3fs, Training Accuracy: %.3f \
            --- Test Accuracy thus far: %.3f' % (iteration, train_time, train_acc, test_acc))

Iter [0] Train: Time: 1.134s, Training Accuracy: 86.417             --- Test Accuracy thus far: 88.000
Iter [1] Train: Time: 1.132s, Training Accuracy: 96.917             --- Test Accuracy thus far: 91.667
Iter [2] Train: Time: 1.136s, Training Accuracy: 99.000             --- Test Accuracy thus far: 85.667
Iter [3] Train: Time: 1.123s, Training Accuracy: 99.417             --- Test Accuracy thus far: 91.333
Iter [4] Train: Time: 1.130s, Training Accuracy: 99.833             --- Test Accuracy thus far: 91.000
reset learning rate to 0.0025
Iter [5] Train: Time: 1.127s, Training Accuracy: 99.833             --- Test Accuracy thus far: 90.333
Iter [6] Train: Time: 1.131s, Training Accuracy: 99.833             --- Test Accuracy thus far: 91.000
Iter [7] Train: Time: 1.125s, Training Accuracy: 99.833             --- Test Accuracy thus far: 90.667
Iter [8] Train: Time: 1.126s, Training Accuracy: 99.833             --- Test Accuracy thus far: 91.000
Saved checkpoint to ./cnn-0009.params
Iter 

## Adam with droput 0.2

In [17]:
batch_size = 20

input_x = mx.sym.Variable('data')
input_y = mx.sym.Variable('softmax_label')

num_embed = 200

embed_layer = mx.sym.Embedding(data=input_x, input_dim=vocab_size, output_dim=num_embed, name='vocab_embed')

conv_input = mx.sym.Reshape(data=embed_layer, shape=(batch_size, 1, sentence_size, num_embed))

filter_list=[2,3,4,5]

num_filter=100
pooled_outputs = []

for filter_size in filter_list:
    convi = mx.sym.Convolution(data=conv_input, kernel=(filter_size, num_embed), num_filter=num_filter)
    relui = mx.sym.Activation(data=convi, act_type='relu')
    pooli = mx.sym.Pooling(data=relui, pool_type='max', kernel=(sentence_size - filter_size + 1, 1), stride=(1, 1))
    pooled_outputs.append(pooli)

total_filters = num_filter * len(filter_list)
concat = mx.sym.Concat(*pooled_outputs, dim=1)

h_pool = mx.sym.Reshape(data=concat, shape=(batch_size, total_filters))

# dropout layer
dropout = 0.2

if dropout > 0.0:
    h_drop = mx.sym.Dropout(data=h_pool, p=dropout)
else:
    h_drop = h_pool

num_label = 2

cls_weight = mx.sym.Variable('cls_weight')
cls_bias = mx.sym.Variable('cls_bias')

fc = mx.sym.FullyConnected(data=h_drop, weight=cls_weight, bias=cls_bias, num_hidden=num_label)

sm = mx.sym.SoftmaxOutput(data=fc, label=input_y, name='softmax')

cnn = sm

################################################################################################################

# Define the structure of our CNN Model (as a named tuple)
CNNModel = namedtuple("CNNModel", ['cnn_exec', 'symbol', 'data', 'label', 'param_blocks'])

# Define what device to train/test on, use GPU if available
ctx = mx.gpu() if mx.test_utils.list_gpus() else mx.cpu()

arg_names = cnn.list_arguments()

input_shapes = {}
input_shapes['data'] = (batch_size, sentence_size)

arg_shape, out_shape, aux_shape = cnn.infer_shape(**input_shapes)
arg_arrays = [mx.nd.zeros(s, ctx) for s in arg_shape]
args_grad = {}
for shape, name in zip(arg_shape, arg_names):
    if name in ['softmax_label', 'data']: # input, output
        continue
    args_grad[name] = mx.nd.zeros(shape, ctx)

cnn_exec = cnn.bind(ctx=ctx, args=arg_arrays, args_grad=args_grad, grad_req='add')

param_blocks = []
arg_dict = dict(zip(arg_names, cnn_exec.arg_arrays))
initializer = mx.initializer.Uniform(0.1)
for i, name in enumerate(arg_names):
    if name in ['softmax_label', 'data']: # input, output
        continue
    initializer(mx.init.InitDesc(name), arg_dict[name])

    param_blocks.append( (i, arg_dict[name], args_grad[name], name) )

data = cnn_exec.arg_dict['data']
label = cnn_exec.arg_dict['softmax_label']

cnn_model= CNNModel(cnn_exec=cnn_exec, symbol=cnn, data=data, label=label, param_blocks=param_blocks)

########################################################################################################################

'''
Train the cnn_model using back prop
'''

optimizer = 'adam'
max_grad_norm = 50.0
learning_rate = 0.005
epoch = 10

# create optimizer
opt = mx.optimizer.create(optimizer)
opt.lr = learning_rate

updater = mx.optimizer.get_updater(opt)

# For each training epoch
for iteration in range(epoch):
    tic = time.time()
    num_correct = 0
    num_total = 0

    # Over each batch of training data
    for begin in range(0, x_train.shape[0], batch_size):
        batchX = x_train[begin:begin+batch_size]
        batchY = y_train[begin:begin+batch_size]
        if batchX.shape[0] != batch_size:
            continue

        cnn_model.data[:] = batchX
        cnn_model.label[:] = batchY

        # forward
        cnn_model.cnn_exec.forward(is_train=True)

        # backward
        cnn_model.cnn_exec.backward()

        # eval on training data
        num_correct += sum(batchY == np.argmax(cnn_model.cnn_exec.outputs[0].asnumpy(), axis=1))
        num_total += len(batchY)

        # update weights
        norm = 0
        for idx, weight, grad, name in cnn_model.param_blocks:
            grad /= batch_size
            l2_norm = mx.nd.norm(grad).asscalar()
            norm += l2_norm * l2_norm

        norm = math.sqrt(norm)
        for idx, weight, grad, name in cnn_model.param_blocks:
            if norm > max_grad_norm:
                grad *= (max_grad_norm / norm)

            updater(idx, grad, weight)

            # reset gradient to zero
            grad[:] = 0.0

    # Decay learning rate for this epoch to ensure we are not "overshooting" optima
    if iteration % 5 == 0 and iteration > 0:
        opt.lr *= 0.5
        print('reset learning rate to %g' % opt.lr)

    # End of training loop for this epoch
    toc = time.time()
    train_time = toc - tic
    train_acc = num_correct * 100 / float(num_total)

    # Saving checkpoint to disk
    if (iteration + 1) % 10 == 0:
        prefix = 'cnn'
        cnn_model.symbol.save('./%s-symbol.json' % prefix)
        save_dict = {('arg:%s' % k) : v  for k, v in cnn_model.cnn_exec.arg_dict.items()}
        save_dict.update({('aux:%s' % k) : v for k, v in cnn_model.cnn_exec.aux_dict.items()})
        param_name = './%s-%04d.params' % (prefix, iteration)
        mx.nd.save(param_name, save_dict)
        print('Saved checkpoint to %s' % param_name)


    # Evaluate model after this epoch on test set
    num_correct = 0
    num_total = 0

    # For each test batch
    for begin in range(0, x_test.shape[0], batch_size):
        batchX = x_test[begin:begin+batch_size]
        batchY = y_test[begin:begin+batch_size]

        if batchX.shape[0] != batch_size:
            continue

        cnn_model.data[:] = batchX
        cnn_model.cnn_exec.forward(is_train=False)

        num_correct += sum(batchY == np.argmax(cnn_model.cnn_exec.outputs[0].asnumpy(), axis=1))
        num_total += len(batchY)

    test_acc = num_correct * 100 / float(num_total)
    print('Iter [%d] Train: Time: %.3fs, Training Accuracy: %.3f \
            --- Test Accuracy thus far: %.3f' % (iteration, train_time, train_acc, test_acc))

Iter [0] Train: Time: 1.262s, Training Accuracy: 87.917             --- Test Accuracy thus far: 81.667
Iter [1] Train: Time: 1.142s, Training Accuracy: 95.833             --- Test Accuracy thus far: 89.667
Iter [2] Train: Time: 1.135s, Training Accuracy: 98.750             --- Test Accuracy thus far: 92.000
Iter [3] Train: Time: 1.141s, Training Accuracy: 99.583             --- Test Accuracy thus far: 90.333
Iter [4] Train: Time: 1.138s, Training Accuracy: 99.750             --- Test Accuracy thus far: 91.000
reset learning rate to 0.0025
Iter [5] Train: Time: 1.144s, Training Accuracy: 99.750             --- Test Accuracy thus far: 91.000
Iter [6] Train: Time: 1.146s, Training Accuracy: 99.833             --- Test Accuracy thus far: 90.333
Iter [7] Train: Time: 1.146s, Training Accuracy: 99.833             --- Test Accuracy thus far: 90.333
Iter [8] Train: Time: 1.149s, Training Accuracy: 99.833             --- Test Accuracy thus far: 90.333
Saved checkpoint to ./cnn-0009.params
Iter 