# Model training with Jack

## Prerequisites

Note: this command need to be run in terminal from the root of Jack.

Download GloVe:
> `source data/GloVe/download_small.sh`

In [None]:
%load_ext autoreload
%autoreload 2
import os
os.chdir('..')    # change dir to Jack root

In [None]:
from jack import readers
from jack.core import SharedResources
from jack.io.embeddings.embeddings import load_embeddings
from jack.io.load import load_jack
from jack.util.hooks import LossHook, ExamplesPerSecHook
from jack.util.vocab import Vocab
from notebooks.prettyprint import QAPrettyPrint
import tensorflow as tf

Let's check all the currently available readers from `readers.py`:

These are the extractive question answering readers:

In [None]:
[r for r in readers.extractive_qa_readers.keys()]

...the classification readers:

In [None]:
[r for r in readers.classification_readers.keys()]

and the link prediction readers:

In [None]:
[r for r in readers.link_prediction_readers.keys()]

## Shared resources

To train the reader, we need to define a vocabulary. Additionally, our readers we will need word embeddings too. We'll use the downloaded GloVe [[1]](#ref1) embeddings. Both the vocabulary and the embeddings are shared between the two presented readers in the notebook.

In [None]:
glove_path = 'data/GloVe/glove.6B.50d.txt'
embeddings = load_embeddings(glove_path,
                             type='glove')
vocab = Vocab(emb=embeddings,
              init_from_embeddings=True)

## FastQA (SQuAD)

We will be training a FastQA [[2]](#ref2) model on a very small subset of the SQuAD dataset [[3]](#ref3), due to slow training. If you want to train your models on a large datasets (like the full SQuAD dataset), we recommend training them on GPUs.

Additionally, please note a simpler way to train your models from the command line:

> `python3 bin/jack-train.py with config='./conf/qa/squad/fastqa.yaml'`

For more details on training from command line, check [README.md](https://github.com/uclmr/jack#quickstart-examples---training-and-usage-of-a-question-answering-system)

### Data loading

Load up the training data

In [None]:
squad_path = 'data/SQuAD/snippet.jtr.json'
fastqa_train_data = load_jack(squad_path)

### Creating the reader

We need to define the hyperparameter values (representation dimensionality, input representation dimensionality, etc.) and general configuration parameters (maximum span size, etc.) for the FastQA reader:

In [None]:
fastqa_config = {"repr_dim": 10,
                 "repr_dim_input": embeddings.lookup.shape[1],
                 "max_span_size": 10}

Then we create an example reader, based on the (previously defined) vocabulary and the reader configuration:

In [None]:
fastqa_svac = SharedResources(vocab, fastqa_config)
fastqa_reader = readers.fastqa_reader(fastqa_svac)

Afterwards, we set up modules (input, model, output) given a training dataset. `is_training` set to `True` indicates we are in the training phase. After this call, all the parameters of the model will be initialised.

In [None]:
fastqa_reader.setup_from_data(fastqa_train_data, is_training=True)

### Applying the untrained reader

Our model is initialised, but has not been trained yet. We can see that from the predictions it makes:

In [None]:
questions = [q for q, a in fastqa_train_data]
for q, a in zip(questions[:5], fastqa_reader(questions)[:5]):
    print("Question: " + q.question)
    print("Answer:   %s \t %.3f" % (a[0].text, a[0].score))
    print()

Of course the output is not correct because the model was not trained at all.

### Training

First, we set up everything necessary for training. In this case we set the `batch_size` to the size of the dataset, as we're working on a very small dataset. We define hooks which will print out useful information during training (loss and speed) and define the optimiser used (Adam).

In [None]:
# for training we use the bin/jack-train.py script.
batch_size = len(fastqa_train_data)
# short explanation
hooks = [LossHook(fastqa_reader, iter_interval=1), 
         ExamplesPerSecHook(fastqa_reader, batch_size, iter_interval=1)]
optimizer = tf.train.AdamOptimizer(0.11)

...and we start the training procedure:

In [None]:
fastqa_reader.train(optimizer,
                    batch_size=batch_size,
                    hooks=hooks,
                    max_epochs=20,
                    training_set=fastqa_train_data)

### Predictions from the trained reader

Let's take a look at the predictions after 20 epochs of training:

In [None]:
predictions = fastqa_reader(questions)
for q, a in zip(questions[:5], predictions[:5]):
    print("Question: " + q.question)
    print("Answer:   %s \t (score: %.3f)\n" % (a[0].text, a[0].score))

And let's take a look at one of the answers in the context of the paragraph:

In [None]:
questions[20].question

In [None]:
QAPrettyPrint(questions[20].support[0], predictions[20][0].span)

The predicted answers look much better now. However, be aware that this is the prediction of a model trained on a very small subset of data, applied to that same data. Feel free to train your model on the full SQuAD dataset.

### Saving the model

We can now save the model after training it:

In [None]:
fastqa_reader.store("/tmp/fastqa_reader")

## Decomposable attention model (SNLI)

### Data loading

We load the data, and prepare it for later printing

In [None]:
snli_path = 'data/SNLI/snippet.jtr_v1.json'
snli_train_data = load_jack(snli_path)

hypotheses = []
premises = []
labels = []
for input_, output_ in snli_train_data:
    premises.append(input_.support[0])
    hypotheses.append(input_.question)
    labels.append(output_[0].text)

We reset the tensorflow graph to clear out the previously built model

In [None]:
tf.reset_default_graph()

### Creating the reader

As before, we set up the configuration for the model:

In [None]:
snli_config = {"repr_dim": 10,
               "repr_dim_input": embeddings.lookup.shape[1],
               "model": "dam_snli_reader"}

...create the shared resources:

In [None]:
snli_svac = SharedResources(vocab, snli_config)

...build the reader, and set it up with the dataset:

In [None]:
snli_reader = readers.readers["dam_snli_reader"](snli_svac)
snli_reader.setup_from_data(snli_train_data, is_training=True)

### Training

We set up the training procedure, similarly to the FastQA model:

In [None]:
batch_size = len(snli_train_data)
hooks = [LossHook(snli_reader, iter_interval=1), 
         ExamplesPerSecHook(snli_reader, batch_size, iter_interval=1)]
optimizer = tf.train.AdamOptimizer(0.05)

...and run the training:

In [None]:
snli_reader.train(optimizer,
                  batch_size=batch_size,
                  hooks=hooks,
                  max_epochs=20,
                  training_set=snli_train_data)

### Predictions from the trained reader

In [None]:
input_ = [qa_setting for qa_setting, answers in snli_train_data]
output_ = snli_reader(input_)

In [None]:
for p, h, l, o in zip(premises[:5], hypotheses[:5], labels[:5], output_[:5]):
    print('Premise: {}'.format(p))
    print('Hypothesis: {}'.format(h))
    print('Prediction: {} (score: {:.2f})  [Label: {}]\n'.format(o[0].text, o[0].score, l))

## References:

<a id='ref1'>[1]</a> Pennington, Jeffrey, Richard Socher, and Christopher Manning. <a href='http://www.aclweb.org/anthology/D14-1162'>"Glove: Global vectors for word representation."</a> Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 2014.

<a id='ref2'>[2]</a> Weissenborn, Dirk, Georg Wiese, and Laura Seiffe. <a href='http://www.aclweb.org/anthology/K17-1028'>"Making neural qa as simple as possible but not simpler."</a> Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017). 2017.</a>

<a id='ref3'>[3]</a> Rajpurkar, Pranav, et al. <a href='http://www.anthology.aclweb.org/D/D16/D16-1264.pdf'>"SQuAD: 100,000+ Questions for Machine Comprehension of Text."</a> Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 2016.