Llama-3.2-90B-Vision-Instruct

In this section, we report the results for Llama 3.2-Vision models on standard automatic benchmarks. For all these evaluations, we used our internal evaluations library.

Base Pretrained Models

Category	Benchmark	Metric	Llama 3.2 11B	Llama 3.2 90B
Image Understanding	VQAv2 (val)	Accuracy	66.8	73.6
	Text VQA (val)	Relaxed accuracy	73.1	73.5
	DocVQA (val, unseen)	ANLS	62.3	70.7
Visual Reasoning	MMMU (val, 0-shot)	Micro average accuracy	41.7	49.3
	ChartQA (test)	Accuracy	39.4	54.2
	InfographicsQA (val, unseen)	ANLS	43.2	56.8
	AI2 Diagram (test)	Accuracy	62.4	75.3

Instruction Tuned Models

Modality	Capability	Benchmark	# Shots	Metric	Llama 3.2 11B	Llama 3.2 90B
Image	College-level Problems and Mathematical Reasoning	MMMU (val, CoT)	0	Micro average accuracy	50.7	60.3
		MMMU-Pro, Standard (10 opts, test)	0	Accuracy	33.0	45.2
		MMMU-Pro, Vision (test)	0	Accuracy	23.7	33.8
		MathVista (testmini)	0	Accuracy	51.5	57.3
	Charts and Diagram Understanding	ChartQA (test, CoT)	0	Relaxed accuracy	83.4	85.5
		AI2 Diagram (test)	0	Accuracy	91.1	92.3
		DocVQA (test)	0	ANLS	88.4	90.1
	General Visual Question Answering	VQAv2 (test)	0	Accuracy	75.2	78.1

Text	General	MMLU (CoT)	0	Macro_avg/acc	73.0	86.0
	Math	MATH (CoT)	0	Final_em	51.9	68.0
	Reasoning	GPQA	0	Accuracy	32.8	46.7
	Multilingual	MGSM (CoT)	0	em	68.9	86.9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Llama-3.2-90B-Vision-Instruct

Model navigation navigation

Base Pretrained Models

Instruction Tuned Models

About

Tags

Languages