Skip to content
Meta logo

Llama-3.2-90B-Vision-Instruct

Playground
What are some of the most famous works of Shakespeare?
Can you explain the concept of time dilation in physics?
Can you explain the basics of machine learning?

Model navigation navigation

In this section, we report the results for Llama 3.2-Vision models on standard automatic benchmarks. For all these evaluations, we used our internal evaluations library.

Base Pretrained Models

Category Benchmark # Shots Metric Llama 3.2 11B Llama 3.2 90B
Image Understanding VQAv2 (val) 0 Accuracy 66.8 73.6
Text VQA (val) 0 Relaxed accuracy 73.1 73.5
DocVQA (val, unseen) 0 ANLS 62.3 70.7
Visual Reasoning MMMU (val, 0-shot) 0 Micro average accuracy 41.7 49.3
ChartQA (test) 0 Accuracy 39.4 54.2
InfographicsQA (val, unseen) 0 ANLS 43.2 56.8
AI2 Diagram (test) 0 Accuracy 62.4 75.3

Instruction Tuned Models

Modality Capability Benchmark # Shots Metric Llama 3.2 11B Llama 3.2 90B
Image College-level Problems and Mathematical Reasoning MMMU (val, CoT) 0 Micro average accuracy 50.7 60.3
MMMU-Pro, Standard (10 opts, test) 0 Accuracy 33.0 45.2
MMMU-Pro, Vision (test) 0 Accuracy 23.7 33.8
MathVista (testmini) 0 Accuracy 51.5 57.3
Charts and Diagram Understanding ChartQA (test, CoT) 0 Relaxed accuracy 83.4 85.5
AI2 Diagram (test) 0 Accuracy 91.1 92.3
DocVQA (test) 0 ANLS 88.4 90.1
General Visual Question Answering VQAv2 (test) 0 Accuracy 75.2 78.1
Text General MMLU (CoT) 0 Macro_avg/acc 73.0 86.0
Math MATH (CoT) 0 Final_em 51.9 68.0
Reasoning GPQA 0 Accuracy 32.8 46.7
Multilingual MGSM (CoT) 0 em 68.9 86.9

About

Advanced image reasoning capabilities for visual understanding agentic apps.
Context
128k input · 4k output
Training date
Undisclosed
Rate limit tier
Provider support

Languages

 (1)
English