Llama-3.3-70B-Instruct
PreviewGive feedback
Model navigation navigation
In this section, we report the results for Llama 3.3 relative to our previous models.
Category | Benchmark | # Shots | Metric | Llama 3.1 8B Instruct | Llama 3.1 70B Instruct | Llama-3.3 70B Instruct | Llama 3.1 405B Instruct |
---|---|---|---|---|---|---|---|
MMLU (CoT) | 0 | macro_avg/acc | 73.0 | 86.0 | 86.0 | 88.6 | |
MMLU Pro (CoT) | 5 | macro_avg/acc | 48.3 | 66.4 | 68.9 | 73.3 | |
Steerability | IFEval | 80.4 | 87.5 | 92.1 | 88.6 | ||
Reasoning | GPQA Diamond (CoT) | 0 | acc | 31.8 | 48.0 | 50.5 | 49.0 |
Code | HumanEval | 0 | pass@1 | 72.6 | 80.5 | 88.4 | 89.0 |
MBPP EvalPlus (base) | 0 | pass@1 | 72.8 | 86.0 | 87.6 | 88.6 | |
Math | MATH (CoT) | 0 | sympy_intersection_score | 51.9 | 68.0 | 77.0 | 73.8 |
Tool Use | BFCL v2 | 0 | overall_ast_summary/macro_avg/valid | 65.4 | 77.5 | 77.3 | 81.1 |
Multilingual | MGSM | 0 | em | 68.9 | 86.9 | 91.1 | 91.6 |
About
Llama 3.3 70B Instruct offers enhanced reasoning, math, and instruction following with performance comparable to Llama 3.1 405B.
Context
128k input · 4k output
Training date
Dec 2023
Rate limit tier
Provider support
Languages
(8)English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai