0. Make sure that google colab is using GPU-Runtime
1. Preparation

    1. Create a folder named IR in your Google Drive main directory
    2. Download [Claim_Generation](https://github.com/teacherpeterpan/Zero-shot-Fact-Verification) and place it inside IR/ folder.
    3. Download [s2v_old.zip](https://github.com/teacherpeterpan/Zero-shot-Fact-Verification) ( Located under c)Claim Generation ) and place it inside IR/ folder 
    4. Download [Pre-processed Wikipedia Pages (June 2017 dump)](https://fever.ai/dataset/fever.html) and place it inside IR/ folder

note : all downloads will be in .zip format. We will unzip them later on

In [None]:
# 1 - Install necessary components

!rm -r sample_data/
!mkdir logs/
!pip install gsutil > logs/gsutil_log.txt && echo "gsutil module installed" || echo "installation of gsutil failed, see log for more info"
!pip install stanza > logs/stanza_log.txt && echo "stanza module installed" || echo "installation of stanza failed, see log for more info"
!pip install sentencepiece > logs/sentenpiece_log.txt && echo "sentencepiece module installed" || echo "installation of sentenpiece failed, see log for more info"
!pip install nltk > logs/ntlk_log.txt && echo "nltk module installed" || echo "installation of nltk failed, see log for more info"
import nltk
nltk.download('punkt')
!pip install sense2vec > logs/sense2vec_log.txt && echo "sense2vec module installed" || echo "installation of sense2vec failed, see log for more info"
!pip install simpletransformers > logs/transformers_log.txt && echo "Transformers module installed" || echo "installation os transformers failed, see log for more info"

In [None]:
# 2 - Prepare the enviroment.

# Create the directories needed to store the data, the code and the output of the models
!mkdir -p ./data/
!mkdir -p ./output/intermediate/
!mkdir -p ./dependencies/
!mkdir -p ./dependencies/QA2D_model/
!gsutil -m cp gs://few-shot-fact-verification/data/* ./data/

# Connect to Google Drive - The files we want to import are too large and a locally upload would be too slow, so we chose connection
# to Google Drive
from google.colab import drive
drive.mount('/content/drive/')

# Copy and prepare the code
!cp drive/MyDrive/IR/Claim_Generation.zip /content
!unzip Claim_Generation.zip -d code > logs/unzip_claim_Generation_log.txt && echo "unzip of Claim_Generation was ok" || echo "unzip of Claim_Generation failed"
!rm -r Claim_Generation.zip

# We have saved the pretrained s2v model in Google Drive and we import it from there
!cp -r drive/MyDrive/IR/s2v_old.zip /content
!unzip s2v_old.zip -d dependencies > logs/unzip_s2v_old.txt && echo "unzip of s2v was ok" || echo "unzip of s2v failed"
!rm -r s2v_old.zip

# We copy the pretrained QA2D
!gsutil cp gs://few-shot-fact-verification/QA2D_model/* /content/dependencies/QA2D_model > logs/QA2D_model.txt

# We have stored the Wikipedia pages in Google Drive and we import them from there (same here for faster results)
!cp -r drive/MyDrive/IR/wiki_pages.zip /content
!unzip wiki_pages.zip -d data
!rm -r wiki_pages.zip

In this section we will run the model with sample data and save the results localy. The overview of what we'll do is as follows:

1. Run NER extraction
2. Generate QA for training set
3. Generate QA for development set
4. Claim generation of SUPPORTED claims
5. Claim generation of REFUTED claims
6. Claim generation of NEI claims
7. Save results

In [None]:
# 3 - Run the code (NERs, QAs, Claims)

# NER EXTRACTION
!python code/Claim_Generation/Extract_NERs.py \
    --train_path data/fever_train.processed.json \
    --dev_path data/fever_dev.processed.json \
    --save_path output/intermediate/

In [None]:
# QA GENERATION for the training set
!python code/Claim_Generation/Generate_QAs.py \
    --train_path data/fever_train.processed.json \
    --dev_path data/fever_dev.processed.json \
    --data_split train \
    --entity_dict output/intermediate/entity_dict_train.json \
    --save_path output/intermediate/precompute_QAs_train.json

In [None]:
# QA GENERATION for the development set
!python code/Claim_Generation/Generate_QAs.py \
    --train_path data/fever_train.processed.json \
    --dev_path data/fever_dev.processed.json \
    --data_split dev \
    --entity_dict output/intermediate/entity_dict_dev.json \
    --save_path output/intermediate/precompute_QAs_dev.json

In [None]:
# CLAIM GENERATION FOR SUPPORTED CLAIMS
!python code/Claim_Generation/Claim_Generation.py \
    --split train \
    --train_path data/fever_train.processed.json \
    --dev_path data/fever_train.processed.json \
    --entity_dict output/intermediate/entity_dict_train.json \
    --QA_path output/intermediate/precompute_QAs_train.json \
    --QA2D_model_path dependencies/QA2D_model \
    --sense_to_vec_path dependencies/s2v_old \
    --save_path output/SUPPORTED_claims.json \
    --claim_type SUPPORTED 

In [None]:
# CLAIM GENERATION FOR REFUTED CLAIMS
!python code/Claim_Generation/Claim_Generation.py \
    --split train \
    --train_path data/fever_train.processed.json \
    --dev_path data/fever_train.processed.json \
    --entity_dict output/intermediate/entity_dict_train.json \
    --QA_path output/intermediate/precompute_QAs_train.json \
    --QA2D_model_path dependencies/QA2D_model \
    --sense_to_vec_path dependencies/s2v_old \
    --save_path output/REFUTED_claims.json \
    --claim_type REFUTED

In [None]:
# CLAIM GENERATION FOR NEI CLAIMS
!python code/Claim_Generation/Claim_Generation.py \
    --split train \
    --train_path data/fever_train.processed.json \
    --dev_path data/fever_train.processed.json \
    --entity_dict output/intermediate/entity_dict_train.json \
    --QA_path output/intermediate/precompute_QAs_train.json \
    --QA2D_model_path dependencies/QA2D_model \
    --sense_to_vec_path dependencies/s2v_old \
    --save_path output/NEI_claims.json \
    --claim_type NEI \
    --wiki_path data/wiki-pages/wiki-pages

In [None]:
# 4 - Save the results locally

# zip output folder and download it locally
!zip -r output.zip output
from google.colab import files
files.download('output.zip')

This is the testing workbench. It takes as input some data we generated ourselves instead of the input data. The overview is the same

0. Prepare enviroment
    1. Upload into IR/ folder in your google drive both: "fever_dev_test.processed.json" and "fever_train_test.processed.json" files.

       The files can be located under '02 - DATA' in the github repo where you found this notebook
1. Run NER extraction
2. Generate QA for test set
3. Generate QA for dev set
4. Claim generation of SUPPORTED claims for test
5. Claim generation of REFUTED claims for test

In [None]:
# 5 - Testing workbench
!mkdir test/
!cp drive/MyDrive/IR/fever_dev_test.processed.json /content/test
!cp drive/MyDrive/IR/fever_train_test.processed.json /content/test

In [None]:
# NER EXTRACTION for training
!python code/Claim_Generation/Extract_NERs.py \
    --train_path test/fever_train_test.processed.json \
    --dev_path test/fever_dev_test.processed.json \
    --save_path test/

In [None]:
# QA GENERATION for the training set
!python code/Claim_Generation/Generate_QAs.py \
    --train_path test/fever_train_test.processed.json \
    --dev_path test/fever_dev_test.processed.json \
    --data_split train \
    --entity_dict test/entity_dict_train.json \
    --save_path test/precompute_QAs_train.json

In [None]:
# QA GENERATION for the development set
!python code/Claim_Generation/Generate_QAs.py \
    --train_path test/fever_train_test.processed.json \
    --dev_path test/fever_dev_test.processed.json \
    --data_split dev \
    --entity_dict test/entity_dict_dev.json \
    --save_path test/precompute_QAs_dev.json

In [None]:
# CLAIM GENERATION FOR SUPPORTED CLAIMS
!python code/Claim_Generation/Claim_Generation.py \
    --split train \
    --train_path test/fever_train_test.processed.json \
    --dev_path test/fever_train_test.processed.json \
    --entity_dict test/entity_dict_train.json \
    --QA_path test/precompute_QAs_train.json \
    --QA2D_model_path dependencies/QA2D_model \
    --sense_to_vec_path dependencies/s2v_old \
    --save_path test/SUPPORTED_claims.json \
    --claim_type SUPPORTED > logs/Claim_Generation_sup_log.txt

In [None]:
# CLAIM GENERATION FOR REFUTED CLAIMS
!python code/Claim_Generation/Claim_Generation.py \
    --split train \
    --train_path test/fever_train_test.processed.json \
    --dev_path test/fever_train_test.processed.json \
    --entity_dict test/entity_dict_train.json \
    --QA_path test/precompute_QAs_train.json \
    --QA2D_model_path dependencies/QA2D_model \
    --sense_to_vec_path dependencies/s2v_old \
    --save_path test/REFUTED_claims.json \
    --claim_type REFUTED > logs/Claim_Generation_sup_log.txt