Skip to content

washuvis/VisLit-VLM-Eval

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

VLM Visualization Literacy Assessment

This repository contains the implementation and evaluation framework for assessing visualization literacy capabilities of Visual Language Models (VLMs) using standardized tests VLAT and CALVI. The study provides a comprehensive comparison of four state-of-the-art VLMs' abilities to interpret, reason about, and critically analyze data visualizations.

🎯 Project Overview

The project evaluates VLMs through:

  • Visualization Literacy Assessment Test (VLAT) - 53 multiple-choice items across 12 visualization types
  • Critical thinking Assessment for Literacy in Visualization (CALVI) - 45 items focused on misleading visualization elements
  • 10 randomized evaluation runs per model to ensure robust results

🤖 Models Evaluated

Model Version Provider
GPT-4 Vision GPT-4o OpenAI
Claude 3.5 Sonnet Anthropic
Gemini 1.5 Pro Google
Llama 3.2-vision Meta

All models are configured with:

  • Temperature: 0
  • Max tokens: 300

📁 Repository Structure

├── README.md
├── data/
│   ├── VLAT/                 # VLAT test images and questions
│   └── CALVI/                # CALVI test images and questions
├── scripts/
│   ├── gpt4_evaluation.ipynb        # GPT-4 Vision evaluation notebook
│   ├── claude_evaluation.ipynb      # Claude evaluation notebook
│   ├── gemini_evaluation.ipynb      # Gemini evaluation notebook
│   ├── llama_evaluation.ipynb       # Llama evaluation notebook
├── prompts/
│   ├── VLAT_prompt.txt      # Standardized VLAT assessment prompt
│   └── CALVI_prompt.txt     # Standardized CALVI assessment prompt
├── Output/
│   ├── CALVI/               # model responses to CALVI questions
│   ├── VLAT/                # model responses to VLAT questions

🚀 Getting Started

  1. Clone the repository:
git clone https://github.com/washuvis/VisLit-VLM-Eval.git
  1. Install required dependencies:
pip install -r requirements.txt
  1. Configure API keys:

    • Add your API keys for each VLM provider
  2. Run evaluations:

    • Navigate to the scripts directory
    • Execute evaluation notebooks for each model

About

This repository contains Jupyter Notebooks, prompts, and evaluation setups for assessing visualization literacy in Visual Language Models (VLMs). Benchmarks include VLAT and CALVI, comparing GPT-4o, Claude 3.5 Sonnet, Gemini 1.5 Pro, and Llama 3.2-vision at temperature 0 and max_tokens of 300.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors