This repository is the implementation of Generalizing Visual Question Answering from Synthetic to Human-Written Questions via a Chain of QA with a Large Language Model
- Clone MDETR and follow instructions to setup environment.
- Copy ipynb files in the mdetr folder of this repo to the cloned mdetr folder.
- Download mdetr checkpoint trained on CLEVR from here clevr_checkpoint.pth and place it into the mdetr folder. Information about the checkpoint is available here
- Download CLEVR-Human dataset (json files) and correspoing images. Note that you need to configure the path to the dataset in the code to match your setup.
- Clone OFA and follow instructions to setup environment.
- Train finetune OFA model with MIMIC-DIFF-VQA dataset
- Copy ipynb files in the ofa folder of this repo to the cloned OFA folder.
- Download PLEURAL checkpoint trained on MIMIC-DIFF-VQA from here and place it into the mdetr folder.
- Download VQA-RAD and SLAKE. Note that you need to configure the path to the dataset in the code to match your setup.