COCO-MMR

COCO Multi-Modal Reasoning dataset (COCO-MMR) is the dataset introduced in our paper titled "Enhancing Human-like Multi-Modal Reasoning: A New Challenging Dataset and Comprehensive Framework". It aims to facilitate cutting-edge research in the field of multi-modal reasoning, particularly focusing on open-ended questions.

Figure 1: This figure outlines the processes of deriving rationales from both the COCO caption dataset and the COCO VQA dataset, as elaborated in Fig. 2 of our paper. This procedure is integral in laying down the foundation for the COCO-MMR dataset by facilitating a detailed multi-modal reasoning process through the extraction and processing of rationales from diverse data subsets.

Figure 2: Enhancing the rationale derivation depicted in Figure 1, this figure elucidates the refinement process undertaken to further polish the rationales. Utilizing the avant-garde Qwen-VL model, the rationales earlier derived were scrutinized and amended to rule out any inconsistencies. This refinement process culminated with a thorough review by a team of ten members to ensure the rationales adhered to the highest standards of applicability and accuracy.

Figure 3: This figure presents our proposed model architecture as described in Fig. 6 of our paper.

🔎 Paper: Enhancing Human-like Multi-Modal Reasoning: A New Challenging Dataset and Comprehensive Framework

Abstract

Multimodal reasoning is a critical component in the pursuit of artificial intelligence systems that exhibit human-like intelligence, especially when tackling complex tasks. While the chain-of-thought (CoT) technique has gained considerable attention, the existing ScienceQA dataset lacks a comprehensive evaluation of diverse approaches. To address this gap, we present the COCO-MMR, a dataset encompassing a wide array of open-ended questions, rationales, and answers derived from the large object dataset COCO. Our dataset pioneers the use of open-ended questions in the context of multimodal CoT, providing a more challenging problem that effectively assesses the reasoning capabilities of CoT models. Through evaluations and analyses, we propose innovative techniques to enhance image and text encoders, offering novel perspectives for advancing multimodal reasoning.

@article{wei2023enhancing,
  title={Enhancing Human-like Multi-Modal Reasoning: A New Challenging Dataset and Comprehensive Framework},
  author={Wei, Jingxuan and Tan, Cheng and Gao, Zhangyang and Sun, Linzhuang and Li, Siyuan and Yu, Bihui and Guo, Ruifeng and Li, Stan Z},
  journal={arXiv preprint arXiv:2307.12626},
  year={2023}
}

8. License

This project is licensed under the Apache-2.0 License.

9. Acknowledgement

We would like to express our gratitude to a variety of individuals and teams that contributed to this project. Special thanks go to Deyao Zhu and the team behind MiniGPT-4, and Jinze Bai and the team behind Qwen-VL. We appreciate the hard work of the entire manual review team that worked tirelessly to review and validate the dataset.

Further, we acknowledge the initial contributions from other projects including ScienceQA, Transformers, pytorch-image-models, and mm-cot. A warm thank you to Pan Lu and his team for providing parameter sizes for the ScienceQA baseline, and to Zhuosheng Zhang and his team for sharing multimodal baselines for ScienceQA.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
data/coco		data/coco
image		image
README.md		README.md
dld.py		dld.py
evaluations.py		evaluations.py
image_progress_coco.py		image_progress_coco.py
requirements.txt		requirements.txt
utils_data.py		utils_data.py
utils_evaluate.py		utils_evaluate.py
utils_prompt.py		utils_prompt.py

weijingxuan/COCO-MMR

Folders and files

Latest commit

History

Repository files navigation

COCO-MMR

🔎 Paper: Enhancing Human-like Multi-Modal Reasoning: A New Challenging Dataset and Comprehensive Framework

Abstract

Table of Contents

1. Requirements

2. downloading-the-dataset

3. preparing-data--evaluation

Code Description

4. The Huggingface

5. Version History

6. Upcoming

7. Citing COCO-MMR

8. License

9. Acknowledgement

About

Resources

Stars

Watchers

Forks

Languages