Skip to content

UKPLab/eacl2023-xlingvqa

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Delving Deeper into Cross-lingual Visual Question Answering

This repository contains code for our EACL2023 Findings paper.

https://www.tu-darmstadt.de/

https://www.ukp.tu-darmstadt.de/

This repository contains experimental software and is published for the sole purpose of giving additional background details on the respective publication.

Don't hesitate to send us an e-mail or report an issue, if you have further questions.

Implementations

We include and adapt the various frameworks here. The original code is from the following repositories:

  1. M3P [MIT License]
  2. UC2 [MIT License]
  3. MMT-Retrieval [UKP]

Installation

We used conda 4.10.3, Python 3.8.11 and cuda/11.1 for experiments.

pip install -r requirements.txt

Running Experiments

Based on our experiments, we found M3P is very sensitive to random initializations and results are much higher in standard deviations. We suggest using UC2 directly.

Alternatively, take a look at the benchmark IGLUE, which is a comprehensive framework for evaluating multimodal multilingual problems (supporting models include M3P/UC2, feature extractions, datasets include xGQA and more, as well as translate-test data).

Data

You can download xGQA from here The data should be following a structure like this.

/data/gqa/
├── features
├── butd_features
├── few_shot
│   ├── bn
│   ├── de
│   ├── en
│   ├── id
│   ├── ko
│   ├── pt
│   ├── ru
│   └── zh
├── train_balanced_questions.json
├── testdev_all_questions.json
├── testdev_balanced_questions_{lang}.json
├── train_all_questions
└── answer2label.pt

Image Feature Extractions

M3P:

See this for a guide on feature extractions here

UC2:

UC2 uses bottom-up-attention features (different from M3P!), we used the bottom up attention for feature extractions with Faster-RCNN R101 (R101-k10-100). This link for download.

Configurations

Example configs are in the configs folder.

In the config file:

prefix: [True/False] Prepend a question type token or not.

freeze_train: [True/False] Freeze text embeddings or not.

use_deep: [True/False] To switch between a deeper classification head or a shallow head.

is_stage2: [True/False] Indicate whether the current training stage is stage 2 (self-bootstrapping in our paper) or not. You can also specify this from the command line when running the experiments.

original_weights_path: Points to the pretrained original UC2/M3P model weights. This will be used to replace the newly trained weights if you plan to do the 2-stage training in our paper (self-bootstrapping).

pretrained_model_path: Points to the original pretrained UC2/M3P model weights. After training a model, you should remove pretrained_model_path from the config file and use model_path instead to point to the right checkpoint (to be compatible with 2-stage training).

All the paths for data and model_config in the config files require updates.

Training

Training:

python run.py $config

Training Stage 2: [This is the self-bootstrapping part.] Please remember to update the model path in the config. You can specify if the current training is the stage 2 training from the running command or the config. This training stage is not required if you don't want to do the self-bootstrapping. To go around with the implementation limit of sentence-transformer, this stage requires copying of models and deleting old checkpoints. Please make backups if you want to keep a copy of models trained from stage 1.

python run.py $config --stage2

Citation

If you use this software, please cite the following paper:

@inproceedings{liu-etal-2023-delving,
    title = "Delving Deeper into Cross-lingual Visual Question Answering",
    author = "Chen Cecilia Liu and Jonas Pfeiffer and Anna Korhonen and Ivan Vuli\'{c} and Iryna Gurevych"
    booktitle = "Findings of the Association for Computational Linguistics: EACL 2023",
    month = may,
    year = "2023",
    address = "Dubrovnik, Croatia",
    publisher = "Association for Computational Linguistics",
    pages = "To appear",
}

Releases

No releases published

Packages

No packages published

Languages