Skip to content
Meta logo

Llama-3.3-70B-Instruct

Playground
What is the history of the Great Wall of China?
Can you explain the basics of machine learning?
What are some of the most famous works of Shakespeare?

Model navigation navigation

In this section, we report the results for Llama 3.3 relative to our previous models.

Instruction tuned models

Category Benchmark # Shots Metric Llama 3.1 8B Instruct Llama 3.1 70B Instruct Llama-3.3 70B Instruct Llama 3.1 405B Instruct
MMLU (CoT) 0 macro_avg/acc 73.0 86.0 86.0 88.6
MMLU Pro (CoT) 5 macro_avg/acc 48.3 66.4 68.9 73.3
Steerability IFEval 80.4 87.5 92.1 88.6
Reasoning GPQA Diamond (CoT) 0 acc 31.8 48.0 50.5 49.0
Code HumanEval 0 pass@1 72.6 80.5 88.4 89.0
MBPP EvalPlus (base) 0 pass@1 72.8 86.0 87.6 88.6
Math MATH (CoT) 0 sympy_intersection_score 51.9 68.0 77.0 73.8
Tool Use BFCL v2 0 overall_ast_summary/macro_avg/valid 65.4 77.5 77.3 81.1
Multilingual MGSM 0 em 68.9 86.9 91.1 91.6

About

Llama 3.3 70B Instruct offers enhanced reasoning, math, and instruction following with performance comparable to Llama 3.1 405B.
Context
128k input · 4k output
Training date
Dec 2023
Rate limit tier
Provider support

Languages

 (8)
English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai