Visual Dialog for Spotting the Differences between Pairs of Similar Images

Directory Structure

Spot_Difference
|-- bottom-up-attention-vqa
|-- checkpoints
		|-- pretrained
				|-- bert-base-uncased
				|-- gpt2
				|-- model_LXRT.pth
		|-- ...
|-- data
		|-- 0206
				|-- spot_diff_train.json
				|-- ...
		|-- img_feat_3ee94.h5
|-- dataloader
		|-- guesser_dataloader.py
		|-- loader_utils.py
		|-- qgen_dataloader.py
|-- lxmert
		|-- ...
|-- model
		|-- guesser.py
		|-- qgen.py
|-- scripts
|-- stat_tools
|-- ...

Pre-Process

Environment

Setup the environment by running pip install -r requirements.txt.

Pre-Trained Model

BERT
GPT-2
LXMERT: could be download in https://github.com/airsplay/lxmert.

The pre-trained model should be put in checkpoints/pretrained.

SpotDiff Dataset

SpotDiff dialogues: three JSON file, i.e., spot_diff_train.json, spot_diff_val.json, spot_diff_test.json. You could download these files from Baidu Netdisk.
SpotDiff images

You could download the original images from my Baidu Netdisk.
Due to the large size of images, I compressed it into four files. You should download these files to your local device and then proceed to merge and decompress them.
Considering the original image collection is too large, you can only use a subset of it.

Image features: are extrated by bottom-up top-down attention. The extracted features could be downloaded here. We extracted butd features by running the code bottom-up-attention.pytorch.

Training

require to modify <work_dir> and <img_feat_file> in the following scripts.

<work_dir>: the project directory.
<img_feat_file> the h5 file that contains image features, data/img_feat_3ee94.h5

QGen

GPT and LXMERT-based VQG model

sh scripts/train_<vqg_model_type>_vqg.sh

<vqg_model_type>: gpt, lxrt

A-Bot

BUTD and LXMERT-based VQA model

sh scripts/train_<vqa_model_type>_vqa.sh

<vqa_model_type>: butd, lxrt

Guesser

Bert-based Guesser

sh scripts/train_guesser.sh

Evaluation

sh scripts/self_play_{vqg_model_type}_{vqa_model_type}.sh

<vqg_model_type>: gpt, lxrt
<vqa_model_type>: butd, lxrt

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
bottom-up-attention-vqa		bottom-up-attention-vqa
dataloader		dataloader
lxmert		lxmert
model		model
scripts		scripts
tools		tools
README.md		README.md
requirements.txt		requirements.txt
self_play_gpt_butd.py		self_play_gpt_butd.py
self_play_gpt_lxrt.py		self_play_gpt_lxrt.py
self_play_lxrt_butd.py		self_play_lxrt_butd.py
self_play_lxrt_lxrt.py		self_play_lxrt_lxrt.py
train.py		train.py
utils.py		utils.py

Navigation Menu

zd11024/Spot_Difference

Folders and files

Latest commit

History

Repository files navigation

Visual Dialog for Spotting the Differences between Pairs of Similar Images

Directory Structure

Pre-Process

Environment

Pre-Trained Model

SpotDiff Dataset

Training

QGen

A-Bot

Guesser

Evaluation

About

Resources

Stars

Watchers

Forks

Languages