Spot_Difference
|-- bottom-up-attention-vqa
|-- checkpoints
|-- pretrained
|-- bert-base-uncased
|-- gpt2
|-- model_LXRT.pth
|-- ...
|-- data
|-- 0206
|-- spot_diff_train.json
|-- ...
|-- img_feat_3ee94.h5
|-- dataloader
|-- guesser_dataloader.py
|-- loader_utils.py
|-- qgen_dataloader.py
|-- lxmert
|-- ...
|-- model
|-- guesser.py
|-- qgen.py
|-- scripts
|-- stat_tools
|-- ...
Setup the environment by running pip install -r requirements.txt
.
- BERT
- GPT-2
- LXMERT: could be download in https://github.com/airsplay/lxmert.
The pre-trained model should be put in checkpoints/pretrained.
- SpotDiff dialogues: three JSON file, i.e., spot_diff_train.json, spot_diff_val.json, spot_diff_test.json. You could download these files from Baidu Netdisk.
- SpotDiff images
- You could download the original images from my Baidu Netdisk.
- Due to the large size of images, I compressed it into four files. You should download these files to your local device and then proceed to merge and decompress them.
- Considering the original image collection is too large, you can only use a subset of it.
- Image features: are extrated by bottom-up top-down attention. The extracted features could be downloaded here. We extracted butd features by running the code bottom-up-attention.pytorch.
require to modify <work_dir> and <img_feat_file> in the following scripts.
- <work_dir>: the project directory.
- <img_feat_file> the h5 file that contains image features, data/img_feat_3ee94.h5
GPT and LXMERT-based VQG model
sh scripts/train_<vqg_model_type>_vqg.sh
- <vqg_model_type>: gpt, lxrt
BUTD and LXMERT-based VQA model
sh scripts/train_<vqa_model_type>_vqa.sh
- <vqa_model_type>: butd, lxrt
- Bert-based Guesser
sh scripts/train_guesser.sh
sh scripts/self_play_{vqg_model_type}_{vqa_model_type}.sh
- <vqg_model_type>: gpt, lxrt
- <vqa_model_type>: butd, lxrt