MagicBench: A Deception-Sensitive Cognitive Benchmark for LLMs
-
Updated
Apr 17, 2026 - Python
MagicBench: A Deception-Sensitive Cognitive Benchmark for LLMs
SOEA-Plus (PDEMC ): 3-task biomedical metacognition benchmark evaluating LLM metacognitive control across 4 frontier models on 300 real PubMed examples. Reveals the Control Collapse Gap."
Add a description, image, and links to the cognitive-benchmarks topic page so that developers can more easily learn about it.
To associate your repository with the cognitive-benchmarks topic, visit your repo's landing page and select "manage topics."