mllm-evaluation

Here are 14 public repositories matching this topic...

AdaCheng / EgoThink

[CVPR'24 Highlight] The official code and data for paper "EgoThink: Evaluating First-Person Perspective Thinking Capability of Vision-Language Models"

egocentric-vision mllm-evaluation

Updated Mar 25, 2025
Python

zhousheng97 / EgoTextVQA

Star

[CVPR'25] 🌟🌟 EgoTextVQA: Towards Egocentric Scene-Text Aware Video Question Answering

videoqa mllm-evaluation scene-text-vqa scene-text-videoqa egocentric-qa-assistance

Updated Jun 19, 2025
Python

luo-junyu / FinMME

Star

[ACL 2025] FinMME: Benchmark Dataset for Financial Multi-Modal Reasoning Evaluation

mllm-evaluation mllm-reasoning

Updated Jun 19, 2025
Python

Lum1104 / EIBench

Star

Why We Feel: Breaking Boundaries in Emotional Reasoning with Multimodal Large Language Models

emotion-analysis chain-of-thought-reasoning mllm-evaluation emotion-reasoning

Updated Jun 19, 2025
Python

Now-Join-Us / OmniEvalKit

Star

The code repository for "OmniEvalKit: A Modular, Lightweight Toolbox for Evaluating Large Language Model and its Omni-Extensions"

evaluation-framework large-language-models mllm llm-evaluation mllm-evaluation

Updated Feb 21, 2025
Python

path2generalist / General-Level

Star

On Path to Multimodal Generalist: General-Level and General-Bench

benchmark llm mllm multimodal-large-language-models multimodal-generalist llm-evaluation mllms mllm-evaluation

Updated May 15, 2025

AdaCheng / VidEgoThink

Star

The official code and data for paper "VidEgoThink: Assessing Egocentric Video Understanding Capabilities for Embodied AI"

egocentric-videos mllm-evaluation

Updated Mar 25, 2025
Python

SkyworkAI / CSVQA

Star

A Multimodal Benchmark for Evaluating Scientific Reasoning Capabilities of VLMs

benchmark reasoning mllm mllm-evaluation mllm-reasoning

Updated Jun 6, 2025
Python

EchoDreamer / Modality-Preference

Star

Modality Preference

mllm-evaluation

Updated Jun 18, 2025
Python

vulab-AI / YESBUT_Homepage

Star

YesBut Benchmark; Project page of paper Cracking the Code of Juxtaposition: Can AI Models Understand the Humorous Contradictions, accepted by NeurIPS 2024 (Oral).

benchmark vlm mllm-evaluation yesbut

Updated Mar 28, 2025
JavaScript

We introduce the YesBut-v2, a benchmark for assessing AI's ability to interpret juxtaposed comic panels with contradictory narratives. Unlike existing benchmarks, it emphasizes visual understanding, comparative reasoning, and social knowledge.

benchmark vlm mllm-evaluation mllm-reasoning yesbut-v2 yesbut

Updated Apr 7, 2025
JavaScript

williamium3000 / ego-privacy

Star

Office implementation of EgoPrivacy (ICML2025)

privacy vision egocentric egocentric-vision mllm mllm-evaluation

Updated Jun 18, 2025

simoncwang / MMO

Star

Multimodal Multi-agent Organization and Benchmarking

benchmarking llms mllms mllm-evaluation

Updated Dec 27, 2024
Python

williamium3000 / core-knowledge

Star

Office codebase for ICML 2025 paper "Core Knowledge Deficits in Multi-Modal Language Models"

core-knowledge large-language-model multi-modal-large-language-model mllm-evaluation

Updated Jun 20, 2025

Improve this page

Add a description, image, and links to the mllm-evaluation topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the mllm-evaluation topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mllm-evaluation

Here are 14 public repositories matching this topic...

AdaCheng / EgoThink

zhousheng97 / EgoTextVQA

luo-junyu / FinMME

Lum1104 / EIBench

Now-Join-Us / OmniEvalKit

path2generalist / General-Level

AdaCheng / VidEgoThink

SkyworkAI / CSVQA

EchoDreamer / Modality-Preference

vulab-AI / YESBUT_Homepage

vulab-AI / YESBUT-v2

williamium3000 / ego-privacy

simoncwang / MMO

williamium3000 / core-knowledge

Improve this page

Add this topic to your repo