Holdout scenario evaluation harness for AI agents. Doer/Judge/Adversary/Observer roles, probabilistic satisfaction scoring, append-only JSONL audit trails with integrity hashes. Created Dec 2025.
python compliance software-factory audit-trail ai-evaluation agentic-ai agent-testing holdout-scenarios deterministic-agents
-
Updated
Feb 23, 2026 - Python