Open AI-for-security validation benchmark: non-LLM scorer + a SOTA-validation loop. Labeled positive corpus withheld pending coordinated disclosure.
evaluation vulnerability-detection ai-security ml-security llm-security agent-security security-benchmark cwe-862
-
Updated
Jun 4, 2026 - Python