# Question Answering with BiDAF and SQuAD (using AllenNLP)

Author: Vladimir Araujo

Based on: https://demo.allennlp.org/reading-comprehension

## 1.0 Introduction

Question Answering (QA) is a challenging task that NLP tries to solve. The aim is to provide solution to queries expressed in natural language automatically (Hovy, Gerber, Hermjakob, Junk, and Lin 2000). For instance, given the following context:

> Santiago, also known as Santiago de Chile, is the capital and largest city of Chile as well as one of the largest cities in the Americas. It is the center of Chile's most densely populated region, the Santiago Metropolitan Region, whose total population is 7 million, of which more than 6 million live in the city's continuous urban area. The city is entirely located in the country's central valley. Most of the city lies between 500–650 m (1,640–2,133 ft) above mean sea level.

We ask the question

> How many people live in Santiago?

We expect the QA system responds with something like this:

> 7 million

The BiDAF model was proposed by a team from the University of Washington in 2016. BiDAF handily beat the best QA models at that time and for several weeks topped the leaderboard of the Stanford Question and Answering Dataset (SQuAD), arguably the most well-known QA dataset. Although BiDAF’s performance has since been surpassed, the model remains influential in the QA domain. The technical innovation of BiDAF inspired the subsequent development of competing models such as ELMo and BERT, by which BiDAF was eventually dethroned.

## BiDAF Architecture 

<figure>
<center>
<img src='https://allenai.github.io/bi-att-flow/BiDAF.png' width="700" />
</center>
</figure>

This model has 3 principal parts.

1. **Embedding Layers:**
BiDAF has 3 embedding layers whose function is to change the representation of words in the Query and the Context from strings into vectors of numbers.
2. **Attention and Modeling Layers:**
These Query and Context representations then enter the attention and modeling layers. These layers use several matrix operations to fuse the information contained in the Query and the Context. The output of these steps is another representation of the Context that contains information from the Query. This output is referred to in the paper as the “Query-aware Context representation.”
3. **Output Layer:**
The Query-aware Context representation is then passed into the output layer, which will transform it to a bunch of probability values. These probability values will be used to determine where the Answer starts and ends.

## 2.0 Setup

First, we install AllenNLP with pip.

Note: run the commented line only in case the following error appears. `The NVIDIA driver on your system is too old (found version 10010)`

In [None]:
# pip install torch==1.5.0+cu101 torchvision==0.6.0+cu101 -f https://download.pytorch.org/whl/torch_stable.html
!pip install allennlp allennlp-models

## 3.0 Train Model

This is where we can train our own model.

### 3.1 Get Training and Evaluation Data

The [SQuAD](https://rajpurkar.github.io/SQuAD-explorer/) is a reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage, or the question might be unanswerable. Read more about this dataset here: https://rajpurkar.github.io/SQuAD-explorer/

Now get the SQuAD V1.1 dataset. `squad-train-v1.1.json` is for training and `squad-dev-v1.1.json` is for evaluation to see how well your model trained.

In [None]:
!wget https://allennlp.s3.amazonaws.com/datasets/squad/squad-train-v1.1.json \
&& wget https://allennlp.s3.amazonaws.com/datasets/squad/squad-dev-v1.1.json

### 3.2 Configuration file for training

AllenNLP uses configuration files for ease of use. This is a Jsonnet file that defines all the parameters for our experiment and model. Don't worry if you're not familiar with Jsonnet, any JSON file is valid Jsonnet.

First, we download the config file of BiDAF created by AllenNLP.

In [None]:
!pip install jsonnet
!wget https://raw.githubusercontent.com/allenai/allennlp-models/v1.0.0/training_config/rc/bidaf_elmo.jsonnet

Now, we can load the JSON file and explore it.

**Notes about hyper-parameters:**

The file containts the original configurations of BiDAF and is ready to use. However, we could edit any value if we want. Among the most important parameters are:

`train_data_path` and `validation_data_path` specifies the path or url of the dataset

`num_epochs` specifies the number to be trained

`batch_size` specifies the size of the batch for training

`cuda_device` set to `0` indicates that GPU will be used

In [None]:
import json
import _jsonnet

jsonnet_file = "bidaf_elmo.jsonnet"
config_file = json.loads(_jsonnet.evaluate_file(jsonnet_file))
config_file

### 3.3 Run training (Optional)

We can now train the model with the training set.

NOTE: it takes about 30 minutes to train an epoch (we run 20 epochs)! If you don't want to wait this long, feel free to skip this step and note the comment in the code to use a pretrained model!

### Mandatory parameters:
`-s` specifies the output folder where the trained model will be stored.

### Optional parameters:

`-o` `overrides` function allows to rewrite a parameter of a config file. 

For instance, `-o '{"validation_data_path": "alternative_dataset.json"}'`

In this case, we will run the training with default config file

In [None]:
!allennlp train bidaf_elmo.jsonnet \
  -s /content/model_output \
  -o '{"train_data_path": "squad-train-v1.1.json" , "validation_data_path": "squad-dev-v1.1.json"}'

## 4.0 Setup prediction code

Now we can use the AlenNLP library to make predictions using a pre-trained model on SQuAD v1.1.

NOTE if you decided train your own model, change the `model_path`


In [None]:
!wget https://storage.googleapis.com/allennlp-public-models/bidaf-elmo-model-2020.03.19.tar.gz

In [None]:
from allennlp.predictors.predictor import Predictor
import allennlp_models.rc

model_path = "bidaf-elmo-model-2020.03.19.tar.gz"
predictor = Predictor.from_path(model_path)

## 5.0 Run predictions

Now for the fun part... testing out your model on different inputs. Pretty rudimentary example here. But the possibilities are endless with this function.

In [None]:
context = "Santiago, also known as Santiago de Chile, is the capital and largest city of Chile as well as one of the largest cities in the Americas. It is the center of Chile's most densely populated region, the Santiago Metropolitan Region, whose total population is 7 million, of which more than 6 million live in the city's continuous urban area. The city is entirely located in the country's central valley. Most of the city lies between 500–650 m (1,640–2,133 ft) above mean sea level."

question = "What is the capital of Chile?"
# question = "How many people live in Santiago?"

# run prediction
answer = predictor.predict_json({
  "passage": context,
  "question": question
})

# Print results
print("Results:")
print(question,answer['best_span_str'])

## 6.0 Activity

Now is your turn. Use the code in Section 4.0 (previous section) to generate your own predictions. To do that, you must change the context variables and questions.


In [None]:
# Your code here

---

Based on this tutorial and the class, set whether the following statements are `True` or `False`.


In [None]:
#@title The SQuAD dataset is a reading comprehension task
answer = None #@param ["None","False", "True"] {type:"raw"}

In [None]:
#@title The BiDAF model is an ELMo model fine-tuned
answer = None #@param ["None","False", "True"] {type:"raw"}

In [None]:
#@title The BiDAF model consists of a single LSTM model
answer = None #@param ["None","False", "True"] {type:"raw"}