# 10.QRNN - Quasi-Recurrent Neural Networks
In this paper, authors introduce a RNN-like architecture which is faster than RNNs. This fast implementation is possible with the use of convolution layer which supports parallel computations not possible in RNN. Let's see the performance of this faster architecture!

**Comment**

I failed to efficiently implement QRNN. This model works, but it gets very slower when we expand the layer's timestep. This is maybe due to the GPU memory problem. So if you run this model on a big dataset like IMDB, it may not work properly. Anyway, let's see how this model works. Even though it does not work efficiently in this implementation, it's working for small datasets like SST, PTB.

### References
- [Quasi-Recurrent Neural Networks - Bradbury et al. 2016](https://arxiv.org/abs/1611.01576)

## Data Preprocessing
Preprocessing codes and processed data are borrowed from [harvardnlp/sent-conv-torch](https://github.com/harvardnlp/sent-conv-torch).

It's getting harder and harder to preprecess data in our model class. So we will preprocess before using `fit_to_corpus()` method as far as we can.

You have to select among these datasets `MR/SST1/SST2/Subj/TREC/CR/MPQA/IMDB`.

In [1]:
import data.sentiment_datasets.preprocess as preprocess
from models import QRNN

import random
import numpy as np

In [2]:
random.seed(1004)

In [3]:
w2v, train, train_label, test, test_label, dev, dev_label, word_to_idx = preprocess.build_dataset("SST2")

SST2.pkl exists! loading from pkl..


In [4]:
train, test, dev, train_label, test_label, dev_label = \
    preprocess.train_test_dev_split(train, test, dev, train_label, test_label, dev_label)

In [5]:
train_data = [train, train_label, dev, dev_label, w2v, word_to_idx]
test_data = [test, test_label]

## Training - SST2
If you perform some more hyperparameter tuning, you might be able to get a better result than this.

In [6]:
model = QRNN.QRNN(batch_size = 24,
                  dropout_keep_prob = 0.7,
                  zoneout_keep_prob = 1.0,
                  learning_rate = 0.0005,
                  filter_windows = [2,2],
                  l2_reg_lambda = 4e-6,
                  hidden_size = 256)

DEBUG: 04231523


In [7]:
model.fit_to_corpus(train_data)

In [8]:
model.train(10, save_dir="save/10_qrnn/sst2", log_dir="log/10_qrnn/sst2", print_every=100)

--------------------------------------------------------------------------------
Created and Initialized fresh model. Size: 5711230
--------------------------------------------------------------------------------
000100: 1 [00100/00288], train_loss = 0.70546216, accuracy = 0.58333331, secs/batch = 0.0123
000200: 1 [00200/00288], train_loss = 0.71992451, accuracy = 0.54166669, secs/batch = 0.0129
Epoch training time: 5.223785638809204

Finished Epoch 1
train_loss = 0.66215535, train_accruacy = 0.57826968
valid_loss = 0.57006465, valid_accuracy = 0.70370370

000388: 2 [00100/00288], train_loss = 0.57274020, accuracy = 0.75000000, secs/batch = 0.0129
000488: 2 [00200/00288], train_loss = 0.42499664, accuracy = 0.79166669, secs/batch = 0.0131
Epoch training time: 3.7058911323547363

Finished Epoch 2
train_loss = 0.57065903, train_accruacy = 0.70095486
valid_loss = 0.53854320, valid_accuracy = 0.73379630

000676: 3 [00100/00288], train_loss = 0.45365071, accuracy = 0.83333331, secs/batch = 

In [9]:
model.test(test_data, load_dir="save/10_qrnn/sst2")

INFO:tensorflow:Restoring parameters from save/10_qrnn/sst2/epoch010_0.3861.model
--------------------------------------------------------------------------------
Restored model from checkpoint for testing. Size: 5711230
--------------------------------------------------------------------------------
test loss = 0.33483320, test accuracy = 0.85777778
test samples: 001800, time elapsed: 0.5791, time per one batch: 0.0077


# Training - PTB
Let's see how this model works for Language Model. Even though i have managed to get 83.88 perplexity which is higher than the paper's result, I'm sure you'll be able to reproduce paper's result if you do some more hyperparameter tunings.

In [9]:
import data.rnnlm_datasets.preprocess as preprocess
from models import QRNN_LM

In [10]:
word_to_idx, char_to_idx, word_tensors, char_tensors, actual_max_word_length = \
    preprocess.build_dataset("ptb", 30, eos='+')


actual longest token length is: 21
size of word vocabulary: 10000
size of char vocabulary: 51
number of tokens in train: 929589
number of tokens in valid: 73760
number of tokens in test: 82430


In [11]:
train_word, valid_word, test_word, train_char, valid_char, test_char = \
    preprocess.train_test_dev_split(word_tensors, char_tensors)

train_data = [train_word, valid_word, word_to_idx]
test_data = [test_word]

In [12]:
model = QRNN_LM.QRNN_LM(batch_size = 20,
                        dropout_keep_prob = 0.5,
                        learning_rate = 1.0,
                        filter_windows = [2,2],
                        l2_reg_lambda = 2e-3,
                        hidden_size = 640,
                        num_unroll_steps = 105,
                        grad_clip = 10.0,
                        zoneout_keep_prob = 0.9,
                        word_embedding_size = 640)

DEBUG: 04231523


In [13]:
model.fit_to_corpus(train_data)

Instructions for updating:
Use the retry module or similar alternatives.


In [None]:
model.train(72, save_dir="save/10_qrnn/ptb", log_dir="log/10_qrnn/ptb", print_every=200)

In [14]:
model.test(test_data, load_dir="save/10_qrnn/ptb")

INFO:tensorflow:Restoring parameters from save/10_qrnn/ptb/epoch072_4.4826.model
--------------------------------------------------------------------------------
Restored model from checkpoint for testing. Size: 17726481
--------------------------------------------------------------------------------
test loss = 4.42936604, perplexity = 83.87822448
test samples: 000780, time elapsed: 1.4223, time per one batch: 0.0365
