GitHub

Mitigating Dataset Bias in Image Captioning through CLIP Confounder-free Captioning Network

This is the code implementation for the paper titled: "Mitigating Dataset Bias in Image Captioning through CLIP Confounder-free Captioning Network" (Accepted to ICIP 2023)

Introduction

To solve the dataset bias problem, we approach from the causal inference perspective and design a causal graph. Based on the causal graph, we propose a novel method named C2Cap which is CLIP confounder-free captioning network. We use the global visual confounder to control the confounding factors in the image and train the model to produce debiased captions.

Installation

Requirements hydra, spacy, and timm

Python >= 3.9, CUDA >= 11.3
PyTorch >= 1.12.0, torchvision >= 0.6.1
Other packages: pycocotools, tensorboard, tqdm, h5py, nltk, einops,
First, clone the repository locally:

git clone https://github.com/yeonju7kim/C2Cap.git
cd grit

Then, create an environment and install PyTorch and torchvision:

conda create -n grit python=3.9
conda activate grit
pip install torch torchvision --extra-index-url https://download.pytorch.org/whl/cu113
# ^ if the CUDA version is not compatible with your system; visit pytorch.org for compatible matches.

Install other requirements:

pip install -r requirements.txt
python -m spacy download en

Usage

Currently, the README and source code are under its initial version. The cleaned and detailed version may be updated soon.

Data preparation

Download and extract COCO 2014 for image captioning including train, val, and test images with annotations from http://cocodataset.org. We expect the directory structure to be the following:

path/to/coco_caption/
├── annotations/  # annotation json files and Karapthy files
├── train2014/    # train images
├── val2014/      # val images
└── test2014/     # test images

Copy the files in data/ to the above annotations folder. It includes vocab.json and some files containing Karapthy ids.

Constructing the confounder dictionary

python create_clip_feature_dictionary.py --train_folder=path_to_coco_train_data --valid_folder=path_to_coco_valid_data

Training

Training on Karapthy splits:

export DATA_ROOT=path/to/coco_caption
# evaluate on the validation split
python eval_caption.py +split='valid' exp.checkpoint=path_to_caption_checkpoint

# evaluate on the test split
python eval_caption.py +split='test' exp.checkpoint=path_to_caption_checkpoint

Evaluation

Evaluation on Karapthy splits:

export DATA_ROOT=path/to/coco_caption
# evaluate on the validation split
python eval_caption.py +split='valid' exp.checkpoint=path_to_caption_checkpoint

# evaluate on the test split
python eval_caption.py +split='test' exp.checkpoint=path_to_caption_checkpoint

Acknowledgement

This code is implemented based on Ruotian Luo's implementation of image captioning in https://github.com/ruotianluo/self-critical.pytorch.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
CLIP		CLIP
captioning		captioning
cider		cider
coco-caption		coco-caption
configs		configs
data		data
projects		projects
scripts		scripts
test		test
tools		tools
vis		vis
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
online_server.py		online_server.py
setup.py		setup.py

License

yeonju7kim/C2Cap

Folders and files

Latest commit

History

Repository files navigation

Mitigating Dataset Bias in Image Captioning through CLIP Confounder-free Captioning Network

Introduction

Installation

Requirements hydra, spacy, and timm

Usage

Data preparation

Constructing the confounder dictionary

Training

Evaluation

Acknowledgement

About

Resources

License

Stars

Watchers

Forks

Languages