[CVPR'24 Highlight] The official code and data for paper "EgoThink: Evaluating First-Person Perspective Thinking Capability of Vision-Language Models"
-
Updated
Mar 25, 2025 - Python
[CVPR'24 Highlight] The official code and data for paper "EgoThink: Evaluating First-Person Perspective Thinking Capability of Vision-Language Models"
[CVPR'25] 🌟🌟 EgoTextVQA: Towards Egocentric Scene-Text Aware Video Question Answering
[ACL 2025] FinMME: Benchmark Dataset for Financial Multi-Modal Reasoning Evaluation
Why We Feel: Breaking Boundaries in Emotional Reasoning with Multimodal Large Language Models
The code repository for "OmniEvalKit: A Modular, Lightweight Toolbox for Evaluating Large Language Model and its Omni-Extensions"
On Path to Multimodal Generalist: General-Level and General-Bench
The official code and data for paper "VidEgoThink: Assessing Egocentric Video Understanding Capabilities for Embodied AI"
A Multimodal Benchmark for Evaluating Scientific Reasoning Capabilities of VLMs
YesBut Benchmark; Project page of paper Cracking the Code of Juxtaposition: Can AI Models Understand the Humorous Contradictions, accepted by NeurIPS 2024 (Oral).
We introduce the YesBut-v2, a benchmark for assessing AI's ability to interpret juxtaposed comic panels with contradictory narratives. Unlike existing benchmarks, it emphasizes visual understanding, comparative reasoning, and social knowledge.
Office implementation of EgoPrivacy (ICML2025)
Multimodal Multi-agent Organization and Benchmarking
Office codebase for ICML 2025 paper "Core Knowledge Deficits in Multi-Modal Language Models"
Add a description, image, and links to the mllm-evaluation topic page so that developers can more easily learn about it.
To associate your repository with the mllm-evaluation topic, visit your repo's landing page and select "manage topics."