cognitive-benchmarks

Star

Here are 2 public repositories matching this topic...

twweeb / MagicBench

Star

MagicBench: A Deception-Sensitive Cognitive Benchmark for LLMs

benchmark evaluation reasoning metacognition theory-of-mind counterfactual-reasoning cognitive-benchmarks

Updated Apr 17, 2026
Python

Haifawaeedd / SOEA-Benchmark

Star

SOEA-Plus (PDEMC ): 3-task biomedical metacognition benchmark evaluating LLM metacognitive control across 4 frontier models on 300 real PubMed examples. Reveals the Control Collapse Gap."

benchmark pubmed kaggle uncertainty-quantification metacognition large-language-models llm biomedical-nlp safe-ai cognitive-benchmarks

Updated Apr 15, 2026
Python

Improve this page

Add a description, image, and links to the cognitive-benchmarks topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the cognitive-benchmarks topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cognitive-benchmarks

Here are 2 public repositories matching this topic...

twweeb / MagicBench

Haifawaeedd / SOEA-Benchmark

Improve this page

Add this topic to your repo