Convolutional Neural Networks for Sentence Classification
Code for the paper Convolutional Neural Networks for Sentence Classification (EMNLP 2014).
Runs the model on Pang and Lee's movie review dataset (MR in the paper). Please cite the original paper when using the data.
Code is written in Python (2.7) and requires Theano (0.7).
Using the pre-trained
word2vec vectors will also require downloading the binary file from
To process the raw data, run
python process_data.py path
where path points to the word2vec binary file (i.e.
This will create a pickle object called
mr.p in the same folder, which contains the dataset
in the right format.
Note: This will create the dataset with different fold-assignments than was used in the paper. You should still be getting a CV score of >81% with CNN-nonstatic model, though.
Running the models (CPU)
THEANO_FLAGS=mode=FAST_RUN,device=cpu,floatX=float32 python conv_net_sentence.py -nonstatic -rand THEANO_FLAGS=mode=FAST_RUN,device=cpu,floatX=float32 python conv_net_sentence.py -static -word2vec THEANO_FLAGS=mode=FAST_RUN,device=cpu,floatX=float32 python conv_net_sentence.py -nonstatic -word2vec
This will run the CNN-rand, CNN-static, and CNN-nonstatic models respectively in the paper.
Using the GPU
GPU will result in a good 10x to 20x speed-up, so it is highly recommended.
To use the GPU, simply change
device=gpu (or whichever gpu you are using).
THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32 python conv_net_sentence.py -nonstatic -word2vec
epoch: 1, training time: 219.72 secs, train perf: 81.79 %, val perf: 79.26 % epoch: 2, training time: 219.55 secs, train perf: 82.64 %, val perf: 76.84 % epoch: 3, training time: 219.54 secs, train perf: 92.06 %, val perf: 80.95 %
epoch: 1, training time: 16.49 secs, train perf: 81.80 %, val perf: 78.32 % epoch: 2, training time: 16.12 secs, train perf: 82.53 %, val perf: 76.74 % epoch: 3, training time: 16.16 secs, train perf: 91.87 %, val perf: 81.37 %
Denny Britz has an implementation of the model in TensorFlow:
HarvardNLP group has an implementation in Torch.
At the time of my original experiments I did not have access to a GPU so I could not run a lot of different experiments. Hence the paper is missing a lot of things like ablation studies and variance in performance, and some of the conclusions were premature (e.g. regularization does not always seem to help).
Ye Zhang has written a very nice paper doing an extensive analysis of model variants (e.g. filter widths, k-max pooling, word2vec vs Glove, etc.) and their effect on performance.