Skip to content

xinke-wang/EST-VQA

Repository files navigation

On the General Value of Evidence, and Bilingual Scene-Text Visual Question Answering

Since the host server of EST-VQA dataset is no longer available, we provide the download link of the dataset in this repository.

We also release the test annotation here, so you don't have to use the EvalAI for evaluation now.

Download

Google Drive: [Images Train] [Images Test] [Annotations Train] [Annotations Test]

Baidu Netdisk: [Images](code: dcmn) [Annotations](code:e4qe)

Evaluation

You can use eval.py to evaluate your model on EST-VQA dataset. Simply convert your prediction file to the same format as pred_sample.json and run the following command:

python eval.py --pred_file PATH_TO_PRED --gt_file PATH_TO_GT

Leaderboard

Part of the results is borrowed from this paper.

Year Venue Model LLM-based EST-VQA (En) EST-VQA (CN) Overall
2023 ICML BLIP2-OPT-6.7B Y 40.7 0
2023 NeurIPS InstructBlip Y 48.6 0.1
2023 arxiv mPlug-Owl Y 52.7 0
2023 arxiv LLaVAR Y 58.2 0
2023 NeurIPS LLaVA-1.5-7B Y 52.3 0
2024 AAAI BLIVA Y 51.2 0.2
2024 CVPR mPLUG-Owl2 Y 68.6 4.9
2024 CVPR Monkey Y 71 42.6

Citation:

If you found EST-VQA useful in your research, please kindly cite using the following BibTeX:

@inproceedings{wang2020general,
  title={On the general value of evidence, and bilingual scene-text visual question answering},
  author={Wang, Xinyu and Liu, Yuliang and Shen, Chunhua and Ng, Chun Chet and Luo, Canjie and Jin, Lianwen and Chan, Chee Seng and Hengel, Anton van den and Wang, Liangwei},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={10126--10135},
  year={2020}
}

About

[CVPR2020] EST-VQA Dataset

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages