# Requirements

In [None]:
!pip install accelerate
!pip install datasets
!pip install torch
!pip install git+https://github.com/huggingface/transformers

# dataset

One of the most canonical datasets for QA is the Stanford Question Answering Dataset, or SQuAD, which comes in two flavors: SQuAD 1.1 and SQuAD 2.0.

In SQuAD 1.1, all questions have an answer in the corresponding passage. SQuAD 2.0 steps up the difficulty by including questions that cannot be answered by the provided passage.

In [None]:
# Download will be executed in run_qa.py below.

# Fine-tuning

HF helpfully provides a script that fine-tunes a Transformer model on one of the SQuAD datasets, called run_squad.py. This script takes care of all the hard work that goes into fine-tuning a model and, as such, it's pretty complicated. It hosts no fewer than 45 arguments, providing an impressive amount of flexibility and utility for those who do a lot of training. 

In [None]:
!curl -L -O https://raw.githubusercontent.com/huggingface/transformers/master/examples/pytorch/question-answering/run_qa.py
!curl -L -O https://raw.githubusercontent.com/huggingface/transformers/master/examples/pytorch/question-answering/trainer_qa.py
!curl -L -O https://raw.githubusercontent.com/huggingface/transformers/master/examples/pytorch/question-answering/utils_qa.py

In [2]:
!python run_qa.py \
  --model_name_or_path bert-base-uncased \
  --dataset_name squad_v2 \
  --do_train \
  --do_eval \
  --version_2_with_negative \
  --per_device_train_batch_size 12 \
  --learning_rate 3e-5 \
  --num_train_epochs 2 \
  --max_seq_length 384 \
  --doc_stride 128 \
  --output_dir ./squad_model/

12/08/2021 13:35:04 - INFO - __main__ - Training/evaluation parameters TrainingArguments(
_n_gpu=4,
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
bf16=False,
bf16_full_eval=False,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_pin_memory=True,
ddp_find_unused_parameters=None,
debug=[],
deepspeed=None,
disable_tqdm=False,
do_eval=True,
do_predict=False,
do_train=True,
eval_accumulation_steps=None,
eval_steps=None,
evaluation_strategy=IntervalStrategy.NO,
fp16=False,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
gradient_accumulation_steps=1,
gradient_checkpointing=False,
greater_is_better=None,
group_by_length=False,
half_precision_backend=auto,
hub_model_id=None,
hub_strategy=HubStrategy.EVERY_SAVE,
hub_token=<HUB_TOKEN>,
ignore_data_skip=False,
label_names=None,
label_smoothing_factor=0.0,
learning_rate=3e-05,
length_column_name=length,
load_best_model_at_end=False,
local_rank=-1,
log_level=-1,
log_level_replica=-1,
log_on_each_

# Load Fine-tuned Model

In [2]:
import torch
from transformers import AutoModelForQuestionAnswering, AutoTokenizer, pipeline

In [3]:
MODEL_PATH = "squad_model"
model = AutoModelForQuestionAnswering.from_pretrained(MODEL_PATH, local_files_only=True)
tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH, local_files_only=True)

# Inference

In [4]:
def qa(context, question):
    inputs = tokenizer.encode_plus(question, context, return_tensors="pt")
    tokens = tokenizer.convert_ids_to_tokens(inputs["input_ids"][0])
    predicts = model(**inputs)
    answer_start = torch.argmax(predicts.start_logits)
    answer_end = torch.argmax(predicts.end_logits) + 1

    print("Question:", question)
    print("Answer:", tokenizer.convert_tokens_to_string(tokens[answer_start:answer_end]))
    print()

In [6]:
context = """Ousted civilian leader sentenced by Myanmar court
Aung San Suu Kyi, Myanmar’s former civilian leader who was detained by the country’s military in a Feb. 1 coup, was convicted on Monday for inciting public unrest and breaching Covid-19 protocols.

Her trials, which the U.N. and foreign governments have described as politically motivated, have been held in closed-door hearings in Myanmar’s capital. The initial four-year sentence was quickly reduced to two years, of which she has already served 10 months. But she still faces nine other charges that could keep her locked up for the rest of her life.

The guilty verdict is likely to galvanize a protest movement that has spurred thousands of people to take up arms against the powerful army. While Aung San Suu Kyi remains widely popular, a new democratic movement has emerged that is younger, more progressive, more confrontational and ready to look beyond past leaders.

This new group, known as the National Unity Government, operates underground schools, clinics and hospitals. And although Aung San Suu Kyi is considered one of the group’s top leaders, it has distanced itself from some of her politics, especially policies concerning ethnic minorities such as Rohingya Muslims.

Crackdown: Protests have not let up since the February coup, and neither has the military response. The junta has killed more than 1,300 people and arrested more than 10,600 others, according to a rights organization."""

qa(context, "What is happening?")
qa(context, "When was the coup?")
qa(context, "Who still remains popularity?")
qa(context, "What is the new group's name?")
qa(context, "How many people were killed?")
qa(context, "How many people were arrested?")

Question: What is happening?
Answer: protest movement

Question: When was the coup?
Answer: feb. 1

Question: Who still remains popularity?
Answer: aung san suu kyi

Question: What is the new group's name?
Answer: national unity government

Question: How many people were killed?
Answer: 1, 300

Question: How many people were arrested?
Answer: more than 10, 600



# Ref

* https://qa.fastforwardlabs.com/pytorch/hugging%20face/wikipedia/bert/transformers/2020/05/19/Getting_Started_with_QA.html#Fine-tuning-a-Transformer-model-for-Question-Answering
* https://github.com/huggingface/transformers/tree/master/examples/pytorch/question-answering#trainer-based-scripts