Open
Description
A few days ago, I create a Request od Sudoku Image.
The evaluatios is here:
https://visioncheckup.com/assessments/sudoku-puzzle-extraction/
I addition, the image is not displayed of the web, I think the evaluation is incorrect. Here my evaluation:
Model | Exact board? | Correct cells | Accuracy | Extracted grid (rows separated by a blank) |
---|---|---|---|---|
ChatGPT-4o | ❌ | 46 / 81 | 56.8 % | ...4.8... .6.7..1.. 72.9.5.4. .97.4..3. ..7.5.8.. .896.5... 941.7.8.. 7..6..4.. ..2.7.... |
Claude 3.7 Sonnet | ❌ | 63 / 81 | 77.8 % | ....48... .6...7..1 7.2.9.5.4 .9.74.3.. ..7.5.8.. .8.96.5.. 9.4.1.7.8 .7...6.4. ....27... |
Claude 4 Opus | ✅ | 81 / 81 | 100 % | ...4.8... .6..7..1. 7.2.9.5.4 .9.7.4.3. ..7.5.8.. .8.9.6.5. 9.4.1.7.8 .7..6..4. ...2.7... |
Claude 4 Sonnet | ❌ | 58 / 81 | 71.6 % | ...4.8... .6..7...1 7.2.9.5.4 .9.7.4..3 ..7.5.8.. ..8.9.6.5 9..4.1.7.8 .7...6..4 ...2.7... |
GPT-4.1 | ❌ | 38 / 81 | 46.9 % | ..4.8.... .6.7..1.. 72.9.5.4. 9.7.4..3. ..7.5.8.. 8.9.6..5. 941.7.8.. 7..6..4.. ..2.7.... |
GPT-4.1 Mini | ❌ | 53 / 81 | 65.4 % | ..4.8.... .6..7..1. 7.29.5.4. .9.7.4.3. .7.5.8... 8.9.6.5.. 9.4.1.7.8 .7..6..4. ..2.7.... |
Gemini 2.0 Flash | ❌ | 80 / 81 | 98.76 % | ...4.8.. .6..7..1. 7.2.9.5.4 .9.7.4.3. ..7.5.8.. .8.9.6.5. 9.4.1.7.8 .7..6..4. ...2.7... |
Gemini 2.0 Flash Lite | ❌ | 43 / 81 | 53.1 % | ....4.8.. ..6..7.1. 7.2.9.5.4 ..9.7.4.3 ...7.5.8. ..8.9.6.5 9.4.1.7.8 ..7..6.4. ....2.7.. |
Gemini 2.5 Pro | ✅ | 81 / 81 | 100 % | ...4.8... .6..7..1. 7.2.9.5.4 .9.7.4.3. ..7.5.8.. .8.9.6.5. 9.4.1.7.8 .7..6..4. ...2.7... |
OpenAI O1 | ❌ | 40 / 81 | 49.4 % | ...4..8.. 6..7....1 72.9.5..4 97..4...3 7...5.8.. 89...6.5. 94178.... 764...... .......27 |
OpenAI O4 Mini | ❌ | 57 / 81 | 70.4 % | ...4.8... .6..7..1. 72..9.5.4 .9.7.4.3. .7..5..8. ..8.9.6.5. 941.7..8. ..7.6..4. ....27... |
Qwen 2.5 VL 7B | ❌ | 12 / 81 | 14.8 % | . . 4 8 . . . 6 7 . . 1 . 7 2 9 5 4 . 9 7 4 3 . . 7 5 8 . 8 9 6 5 . 9 4 1 7 8 . 7 . 6 4 . 2 . 7 . |
Correct answer: ...4.8... .6..7..1. 7.2.9.5.4 .9.7.4.3. ..7.5.8.. .8.9.6.5. 9.4.1.7.8 .7..6..4. ...2.7...
Only Claude 4 Opus and Gemini 2.5 Pro cpmplete the task correctly.
Metadata
Metadata
Assignees
Labels
No labels