TWA

From Token to Word: OCR Token Evolution via Contrastive Learning and Semantic Matching for Text-VQA

ACM International Conference on Multimedia (ACM MM), 2022

This repository is based on and inspired by @microsoft's work . We sincerely thank for their sharing of the codes.

Introduction

We propose a novel Text-VQA method with multi-modal OCR Token-Word Contrastive (TWC) learning. For more details, please refer to our paper.

Citation

@inproceedings{jin2022token,
  title={From Token to Word: OCR Token Evolution via Contrastive Learning and Semantic Matching for Text-VQA},
  author={Jin, Zan-Xia and Shou, Mike Zheng and Zhou, Fang and Tsutsui, Satoshi and Qin, Jingyan and Yin, Xu-Cheng},
  booktitle={Proceedings of the 30th ACM International Conference on Multimedia},
  pages={4564--4572},
  year={2022}
}

Prerequisites

Python 3.8
Pytorch 1.7.0
Please refer to requirements.txt. Or using
```
python setup.py develop
```

Installation

Clone the repository

git clone https://github.com/xiaojino/TWA.git
cd TWA
python setup.py develop --user

Data

Please refer to the Readme in the data folder.

Training

Train the model, run the code under main folder. Using flag --pretrain to access the pre-training mode, otherwise the main QA losses are used to optimize the model. Example yml files are in configs folder. Detailed configs are in released models.

Pre-training:

python -m torch.distributed.launch --nproc_per_node $num_gpu tools/run.py --pretrain --tasks vqa --datasets $dataset --model $model --seed $seed --config configs/vqa/$dataset/"$pretrain_yml".yml --save_dir save/$pretrain_savedir training_parameters.distributed True

# for example
python -m torch.distributed.launch --nproc_per_node 4 tools/run.py --pretrain --tasks vqa --datasets m4c_textvqa --model twa --seed 13 --config configs/vqa/m4c_textvqa/twa_pretrain.yml --save_dir save/twa_pretrain_test training_parameters.distributed True

Fine-tuning:

python -m torch.distributed.launch --nproc_per_node $num_gpu tools/run.py --tasks vqa --datasets $dataset --model $model --seed $seed --config configs/vqa/$dataset/"$refine_yml".yml --save_dir save/$refine_savedir --resume_file save/$pretrain_savedir/$savename/pretrain_best.ckpt training_parameters.distributed True

# for example
python -m torch.distributed.launch --nproc_per_node 4 tools/run.py --tasks vqa --datasets m4c_textvqa --model twa --seed 13 --config configs/vqa/m4c_textvqa/twa_refine.yml --save_dir save/twa_refine_test --resume_file save/pretrained/textvqa/twa_pretrain_best.ckpt training_parameters.distributed True

Evaluate the model, run the code under main folder. Set up val or test set by --run_type.

python -m torch.distributed.launch --nproc_per_node $num_gpu tools/run.py --tasks vqa --datasets $dataset --model $model --config configs/vqa/$dataset/"$refine_yml".yml --save_dir save/$refine_savedir --run_type val --resume_file save/$refine_savedir/$savename/best.ckpt training_parameters.distributed True

# for val evaluation
python -m torch.distributed.launch --nproc_per_node 4 tools/run.py --tasks vqa --datasets m4c_textvqa --model twa --config configs/vqa/m4c_textvqa/twa_refine.yml --save_dir save/twa_refine_test --run_type val --resume_file save/finetuned/textvqa/twa_best.ckpt training_parameters.distributed True

# for test inference 
python  tools/run.py --tasks vqa --datasets m4c_textvqa --model twa --config configs/vqa/m4c_textvqa/twa_refine.yml --save_dir save/twa_refine_test --run_type inference --evalai_inference 1 --resume_file save/finetuned/textvqa/twa_best.ckpt

Pre-trained Models

Please check the detailed experiment settings in our paper.

Model checkpoints.

Credits

The project is built based on the following repository:

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
configs		configs
data		data
projects		projects
pythia		pythia
tools		tools
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TWA

Introduction

Citation

Prerequisites

Installation

Training

Pre-trained Models

Credits

About

Releases

Packages

Languages

License

xiaojino/TWA

Folders and files

Latest commit

History

Repository files navigation

TWA

Introduction

Citation

Prerequisites

Installation

Training

Pre-trained Models

Credits

About

Resources

License

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages