Reproducible AVeriTeC + FEVER 1.0 evaluation harness for AgentOracle's /evaluate endpoint. Code first, full numbers May 17.
reproducible-research ai-agents fact-verification ai-evaluation averitec verifiable-ai x402 agentic-payments coinbase-bazaar fever-benchmark
-
Updated
May 29, 2026 - Python