Skip to content

vipulgupta1011/swapmix

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SwapMix

Implementation of SwapMix approach to measure visual bias for visual question answering(SwapMix: Diagnosing and Regularizing the Over-Reliance on Visual Context in Visual Question Answering, Vipul et al., CVPR22)

Motivation_new

Introduction

We provide a new way to benchmark in a VQA model by perturbing the visual context i.e. irrelevant objects in the image.

The model looks at an image and a question. Then we change the visual context (irrelevant objects to the question) in the image. For each question we make multiple copies of image by changing context. Ideally, we would expect the model's prediction to remain consistent with context switch.

This repository contains code for measuring bias using SwapMix and training VQA models using SwapMix as data augmentation as described in the paper. Specifically, we have applied SwapMix to MCAN and LXMERT. We use GQA dataset for our analysis.

Implementation Details

The code has been divided into MCAN and LXMERT folders. Inside each folder we provide implementation for

  1. Measuring visual bias using SwapMix
  2. Finetuning models using SwapMix as data augmentation technique
  3. Training model with perfect sight.

Download Dataset

We restructured the format of question, answer, and scene graph files provided by GQA a bit. You can download these files along with other files needed for SwapMix implementation from here and place it at data/gqa folder.

We recommend to use object features provided by GQA. Download the features from GQA

Download pretrained models

We provide (1) finetuned model (2) model finetuned using SwapMix as data augmentation (3) model trained with perfect sight (4) model trained with perfect sight and using SwapMix as data augmentation technique. Please download the models from here : MCAN trained models, LXMERT trained models

Evaluation

We measure visual bias of the model for both irrelevant object changes and attribute changes seperately.

Before benchmarking visual bias for these models, we finetune them on GQA train dataset for better performance. Models are evaluated on GQA val set.

To measure visual bias for MCAN, download the dependencies and dataset from here and then run :

cd mcan
python3 run_files/run_evaluate.py --CKPT_PATH=<path to ckpt file>

To measure context reliance after calculating object and attribute results :

cd scripts
python benchmark_frcnn.py --obj <SwapMix object json file>   --attr <SwapMix attribute json file>

Evaluating new model for visual bias

SwapMix can be used to measure visual bias on any VQA model.

Changes are needed on data loading and testing part. The current code iterates over each question indiviually to get predictions for all SwapMix perturbations.

Details for measuring visual bias on a new model can be found here

Citation

If you like our work and find this code useful, consider citing our work :

@inproceedings{gupta2022swapmix,
    title={SwapMix: Diagnosing and Regularizing the Over-Reliance on Visual Context in Visual Question Answering},
    author={Gupta, Vipul and Li, Zhuowan and Kortylewski, Adam and Zhang, Chenyu and Li, Yingwei and Yuille, Alan},
    booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
    year={2022}
}

References

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published