📊 ChartAlignBench

This repo contains the official evaluation code and dataset for the paper "ChartAB: A Benchmark for Chart Grounding & Dense Alignment"

Highlights

🔥 9,000+ instances for VLM evaluation on Dense Chart Grounding and Multi-Chart Alignment.
🔥 Evaluation using novel two stage pipeline that decomposes task into intermediate grounding followed by reasoning resulting in significant accuracy improvement.
🔥 Evaluates both data and attribute understanding across diverse chart types and complexities.

Findings

🔎 Performance degradation on complex charts: VLMs demonstrate strong data understanding on simple charts (e.g., bar, line, or numbered bar/line), but their performance drops substantially on complex types (e.g., 3D, box, radar, rose, or multi-axis charts) due to intricate layouts and component interactions.
🔎 Weak attribute understanding: VLMs exhibit poor recognition of text styles (<20% accuracy for size/font), limited color perception (median RGB error >50), and strong spatial biases in legend positioning.
🔎 Two-stage pipeline proves superior: The ground-then-reason approach consistently outperforms direct inference, reducing hallucinations through intermediate grounding steps.
🔎 Poor grounding/alignment degrade downstream QA: Precise data grounding and alignment correlate positively with downstream QA accuracy, establishing dense chart understanding as essential for reliable reasoning performance.
🔎 Scaling law holds for most alignment tasks: Larger models consistently outperform smaller ones on all but text-style alignment due to JSON generation complexity leading to high number of irregular failures.

Dataset

ChartAB is the first benchmark designed to comprehensively evaluate the dense level understanding of VLMs on charts, focusing on two core content: data (the underlying values visualized by the chart) and attribute (visual attributes impacting chart design such as color, legend position, and text style). The benchmark consists of 9,000+ instances spanning 9 diverse chart types (bar, numbered bar, line, numbered line, 3D bar, box, radar, rose, and multi-axes charts) organized into three evaluation subsets. The Data Grounding & Alignment subset contains chart pairs that differ in data. The Attribute Grounding & Alignment subset comtains chart pairs differing in attributes. Robustness subset contains collection of 5 chart pairs per instance, where each pair maintains identical data difference but varies in an attribute value (color, legend, or text style) across the pairs.

Examples of chart pairs in ChartAlignBench.

Evaluation Demo

1. Environment

conda create -n chart_ab python=3.10
conda activate chart_ab
pip install -r requirements.txt

2. Running the Notebooks

The provided jupyter notebooks for corresponding tasks can be executed directly in order of the cells.

Task Suite	Demo Notebook Link
Data Grounding & Alignment	`demo_notebooks/data_grounding_alignment`
Attribute Grounding & Alignment — Color	`demo_notebooks/color_grounding_alignment`
Attribute Grounding & Alignment — Legend	`demo_notebooks/legend_grounding_alignment`
Attribute Grounding & Alignment — Text Style	`demo_notebooks/text_style_grounding_alignment`
Robustness (of Data Alignment to Attribute Variations )	`demo_notebooks/robustness`

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
assets		assets
demo_notebooks		demo_notebooks
utils		utils
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

📊 ChartAlignBench

Highlights

Findings

Dataset

Evaluation Demo

1. Environment

2. Running the Notebooks

About

Uh oh!

Releases

Packages

Contributors 2

Languages

License

tianyi-lab/ChartAlignBench

Folders and files

Latest commit

History

Repository files navigation

📊 ChartAlignBench

Highlights

Findings

Dataset

Evaluation Demo

1. Environment

2. Running the Notebooks

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages