Skip to content
#

claude-4-sonnet

Here are 6 public repositories matching this topic...

Language: All
Filter by language

Comprehensive evaluation of Claude 4 Sonnet's mathematical assessment capabilities: 500 original problems revealing JSON-induced errors and systematic patterns in LLM evaluation tasks. Research demonstrates 100% accuracy on incorrect answers but 84.3% on correct ones due to premature decision-making in JSON structure.

  • Updated Jul 7, 2025
  • HTML

Improve this page

Add a description, image, and links to the claude-4-sonnet topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the claude-4-sonnet topic, visit your repo's landing page and select "manage topics."

Learn more