# LSTM-RNN For Sentiment Analysis
Author: Trilby Hren and Costa Huang

**Our project has three main parts:**
* Reproduce and understand Keras official demo code on LSTM-RNN for sentiment analysis 
 * [imdb_lstm.py](https://github.com/fchollet/keras/blob/master/examples/imdb_lstm.py) Main program that rains a LSTM on the IMDB sentiment classification task.
 * [imdb.py](https://github.com/fchollet/keras/blob/master/keras/datasets/imdb.py) Preprocessing script of IMDB movie review dataset
* Improve the program's accuracy by exploring different techniques:
 * Preprocessing techniques
 * Activation functions
 * Optimizer choices
* Futher application
 * Visualization of sentiment analysis
 * Applied our result to Amazon Review Datas



## Baseline algorithm (Demo code from Keras)
Firstly we execute [imdb_lstm.py](https://github.com/fchollet/keras/blob/master/examples/imdb_lstm.py) and yields following results:

In [7]:
'''Trains a LSTM on the IMDB sentiment classification task.
The dataset is actually too small for LSTM to be of any advantage
compared to simpler, much faster methods such as TF-IDF + LogReg.
Notes:
- RNNs are tricky. Choice of batch size is important,
choice of loss and optimizer is critical, etc.
Some configurations won't converge.
- LSTM loss decrease patterns during training can be quite different
from what you see with CNNs/MLPs/etc.
'''
from __future__ import print_function
import numpy as np
np.random.seed(1337)  # for reproducibility

from keras.preprocessing import sequence
from keras.utils import np_utils
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Embedding
from keras.layers import LSTM, SimpleRNN, GRU
from keras.datasets import imdb

max_features = 20000
maxlen = 80  # cut texts after this number of words (among top max_features most common words)
batch_size = 32

print('Loading data...')
(X_train, y_train), (X_test, y_test) = imdb.load_data(nb_words=max_features)
print(len(X_train), 'train sequences')
print(len(X_test), 'test sequences')

print('Pad sequences (samples x time)')
X_train = sequence.pad_sequences(X_train, maxlen=maxlen)
X_test = sequence.pad_sequences(X_test, maxlen=maxlen)
print('X_train shape:', X_train.shape)
print('X_test shape:', X_test.shape)

print('Build model...')
model = Sequential()
model.add(Embedding(max_features, 128, dropout=0.2))
model.add(LSTM(128, dropout_W=0.2, dropout_U=0.2))  # try using a GRU instead, for fun
model.add(Dense(1))
model.add(Activation('sigmoid'))

# try using different optimizers and different optimizer configs
model.compile(loss='binary_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

print('Train...')
model.fit(X_train, y_train, batch_size=batch_size, nb_epoch=15,
          validation_data=(X_test, y_test))
score, acc = model.evaluate(X_test, y_test,
                            batch_size=batch_size)
print('Test score:', score)
print('Test accuracy:', acc)

Loading data...
25000 train sequences
25000 test sequences
Pad sequences (samples x time)
X_train shape: (25000, 80)
X_test shape: (25000, 80)
Build model...
Train...
Train on 25000 samples, validate on 25000 samples
Epoch 1/15
Epoch 2/15
Epoch 3/15
Epoch 4/15
Epoch 5/15
Epoch 6/15
Epoch 7/15
Epoch 8/15
Epoch 9/15
Epoch 10/15
Epoch 11/15
Epoch 12/15
Epoch 13/15
Epoch 14/15
Epoch 15/15
Test score: 0.646322505167
Test accuracy: 0.81844


## Understanding the source code
Preprocessing
* Original Dataset is provided by a [Stanford research group](http://ai.stanford.edu/~amaas//data/sentiment/)
* Keras already tokenize the data where each review is encoded as a sequence of word indexes
* Pad each reviews to 80 words

Model Building
* Embedding layer: Map each word into 128 dimentional vector space
* LSTM layer: 
* Dense(1): fully connect LSTM to create one demensional output
* Sigmoid
* Adam optimizer for training.



## Improving the algorithm
Preprocessing techniques
* lala


Activation functions by [Keras.layers.core.Activation](https://keras.io/activations/)
* Sigmoid
* Relu
* Tanh
* Hard sigmoid


Optimizer choices by [keras.optimizers](https://keras.io/optimizers/)
* SGD (Stochastic Gradient Decent)
* Adam
* RMSprop
* Adagrad
* Nadam

## Further Applications
Visualization of sentiment analysis ,inspired by [Taylor Arnold's jupyter notebook](http://euler.stat.yale.edu/~tba3/stat665/lectures/lec21/notebook21.html)