Cross-domain limits of hand-crafted CoT-surface features: AUROC 0.982 in math, 0.434 in coding. Five methods, one conclusion—code correctness is not in the text.
-
Updated
May 13, 2026 - Python
Cross-domain limits of hand-crafted CoT-surface features: AUROC 0.982 in math, 0.434 in coding. Five methods, one conclusion—code correctness is not in the text.
Add a description, image, and links to the token-level-analysis topic page so that developers can more easily learn about it.
To associate your repository with the token-level-analysis topic, visit your repo's landing page and select "manage topics."