Behavior-regression testing for LLM agents. 4-class attribution, 6-field FAIL schema, $-cost gating, flaky detection. Bash + jq. Works with opencode today, runner-pluggable.
bash opencode ci-cd developer-tools regression-testing eval ai-agents claude prompt-engineering llmops anthropic llm-evaluation llm-testing agent-testing
-
Updated
Jun 1, 2026 - Shell