# CNN-Sentence

This notebook is used to test, load, and process CNN-sentence related data.

In [1]:
from classifier.layers import *
from data.data_util import *
from vis.util import *

# for auto-reloading external modules
# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython
%load_ext autoreload
%autoreload 2

## Loading Data

We are only using part of IMDB sentiment analysis dataset. We need to load in both data and an idxmap. Data are all preprocessed and stored. `maxlen` actually controls how many sentences we want. Watch out that everytime we load, the order of data will change.

In [28]:
prefix = "/Users/Aimingnie/Documents/School/Stanford/CS 224N/DeepLearning/dataset/"
datapath = "imdb_lstm.pkl"
idxmap = "imdb_lstm.idxmap.pkl"

data = load_data(prefix+datapath,
                       valid_portion=0.1,
                       maxlen=100,
                       permutation = False)
word_emb, word_idx_map, idx_word_map = load_idx_map(prefix + idxmap)

for k, v in data.iteritems():
  print '%s: ' % k, v.shape

print "="*20

print "word_emb: ", word_emb.shape
print "number of words: ", len(word_idx_map.keys())

X_val:  (244, 99)
X_train:  (2200, 99)
X_test:  (2543, 99)
y_val:  (244,)
y_train:  (2200,)
y_test:  (2543,)
word_emb:  (78271, 300)
number of words:  78271


In [33]:
# Visualize some examples from the dataset.
# We show a few examples of training images from each class.
print "IMDB Movie Review: "
print
print "label: ", data['y_train'][0]
print
print decode_sentences(data['X_train'][0], idx_word_map)

IMDB Movie Review: 

label:  1

i liked the film some of the action scenes were very interesting , tense and well done i especially liked the opening scene which had a semi truck in it a very tense action scene that seemed well done br br some of the transitional scenes were filmed in interesting ways such as time lapse photography , unusual colors , or interesting angles also the film is funny is several parts i also liked how the evil guy was portrayed too i 'd give the film an 8 out of 10


## Conv2d in Theano

Learn about the function and what it does, and how to appropriately add padding for it. `border_mode=half` will keep the same size for odd-sized filters.

In [1]:
from theano.tensor.nnet import conv2d
import theano
import theano.tensor as T
import numpy as np

X = np.asarray(np.random.randn(5,1,15,15),dtype='float32') #N, C, H, W
# filter weight
w = theano.shared(
            value=np.zeros((3,1,3,3), dtype='float32'),
            name='filter1',
            borrow=True
        ) # num_filter = 3, prev-depth = 1, filter_size=3x3
x = T.tensor4('x',dtype='float32')
filtering = conv2d(x, w, border_mode='full', subsample=(1, 1)) #stride: 1,1
f = theano.function([x], filtering)

print "after applying 'full' border_mode padding, we get output size:"
print "intput: ", X.shape
print "output: ", f(X).shape

filtering2 = conv2d(x, w, border_mode='valid', subsample=(1, 1)) #stride: 1,1
f2 = theano.function([x], filtering2)

print

print "after applying 'valid' border_mode padding, we get output size:"
print "intput: ", X.shape
print "output: ", f2(X).shape

filtering3 = conv2d(x, w, border_mode='half', subsample=(1, 1)) #stride: 1,1
f3 = theano.function([x], filtering3)

print 

print "after applying 'half' border_mode padding, we get output size:"
print "intput: ", X.shape
print "output: ", f3(X).shape

filtering4 = conv2d(x, w, border_mode=(3, 3), subsample=(1, 1)) #stride: 1,1
f4 = theano.function([x], filtering3)

print 
print "after applying '(1,1)' custom padding border_mode padding, we get output size:"
print "intput: ", X.shape
print "output: ", f4(X).shape


after applying 'full' border_mode padding, we get output size:
intput:  (5, 1, 15, 15)
output:  (5, 3, 17, 17)

after applying 'valid' border_mode padding, we get output size:
intput:  (5, 1, 15, 15)
output:  (5, 3, 13, 13)

after applying 'half' border_mode padding, we get output size:
intput:  (5, 1, 15, 15)
output:  (5, 3, 15, 15)

after applying '(1,1)' custom padding border_mode padding, we get output size:
intput:  (5, 1, 15, 15)
output:  (5, 3, 15, 15)


## Building Conv Net

Building a conv net to train on sentiment analysis

In [6]:
from classifier.layers import *



{'W0': W0, 'b0': b0}