This repository contains code for our EACL2023 Findings paper.
https://www.ukp.tu-darmstadt.de/
This repository contains experimental software and is published for the sole purpose of giving additional background details on the respective publication.
Don't hesitate to send us an e-mail or report an issue, if you have further questions.
We include and adapt the various frameworks here. The original code is from the following repositories:
- M3P [MIT License]
- UC2 [MIT License]
- MMT-Retrieval [UKP]
We used conda 4.10.3
, Python 3.8.11
and cuda/11.1
for experiments.
pip install -r requirements.txt
Based on our experiments, we found M3P is very sensitive to random initializations and results are much higher in standard deviations. We suggest using UC2 directly.
Alternatively, take a look at the benchmark IGLUE, which is a comprehensive framework for evaluating multimodal multilingual problems (supporting models include M3P/UC2, feature extractions, datasets include xGQA and more, as well as translate-test data).
You can download xGQA from here The data should be following a structure like this.
/data/gqa/
├── features
├── butd_features
├── few_shot
│ ├── bn
│ ├── de
│ ├── en
│ ├── id
│ ├── ko
│ ├── pt
│ ├── ru
│ └── zh
├── train_balanced_questions.json
├── testdev_all_questions.json
├── testdev_balanced_questions_{lang}.json
├── train_all_questions
└── answer2label.pt
See this for a guide on feature extractions here
UC2 uses bottom-up-attention features (different from M3P!), we used the bottom up attention for feature extractions with Faster-RCNN R101 (R101-k10-100). This link for download.
Example configs are in the configs folder.
In the config file:
prefix
: [True/False] Prepend a question type token or not.
freeze_train
: [True/False] Freeze text embeddings or not.
use_deep
: [True/False] To switch between a deeper classification head or a shallow head.
is_stage2
: [True/False] Indicate whether the current training stage is stage 2 (self-bootstrapping in our paper) or not. You can also
specify this from the command line when running the experiments.
original_weights_path
: Points to the pretrained original UC2/M3P model weights.
This will be used to replace the newly trained weights if you plan to do the 2-stage training in our paper (self-bootstrapping).
pretrained_model_path
: Points to the original pretrained UC2/M3P model weights.
After training a model, you should remove pretrained_model_path
from the config file
and use model_path
instead to point to the right checkpoint (to be compatible with 2-stage training).
All the paths for data
and model_config
in the config files require updates.
Training:
python run.py $config
Training Stage 2: [This is the self-bootstrapping part.] Please remember to update the model path in the config. You can specify if the current training is the stage 2 training from the running command or the config. This training stage is not required if you don't want to do the self-bootstrapping. To go around with the implementation limit of sentence-transformer, this stage requires copying of models and deleting old checkpoints. Please make backups if you want to keep a copy of models trained from stage 1.
python run.py $config --stage2
If you use this software, please cite the following paper:
@inproceedings{liu-etal-2023-delving,
title = "Delving Deeper into Cross-lingual Visual Question Answering",
author = "Chen Cecilia Liu and Jonas Pfeiffer and Anna Korhonen and Ivan Vuli\'{c} and Iryna Gurevych"
booktitle = "Findings of the Association for Computational Linguistics: EACL 2023",
month = may,
year = "2023",
address = "Dubrovnik, Croatia",
publisher = "Association for Computational Linguistics",
pages = "To appear",
}