You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Once #131 ships the A/B framework, use it to retroactively measure already-shipped doctrine variants. Stop guessing about the LLM compliance ceiling; get data.
Metric: rate identity_get + issue_resume appear in trajectory on @bro hi
Why this matters
Several recent doctrine PRs were guesses. Without comparison runs we don't know which helped. If hypothesis 4 confirms the ceiling is real, we stop prompt-only fixes for compliance and consider programmatic enforcement.
Acceptance
Each hypothesis run with ≥10 paired runs per arm
Results published as ADR under docs/trustmybot/architecture/manual/decisions/
If a variant lost or was within noise, file followup to revert or re-design
Once #131 ships the A/B framework, use it to retroactively measure already-shipped doctrine variants. Stop guessing about the LLM compliance ceiling; get data.
Hypotheses to test (each ≥10 paired runs per arm)
Why this matters
Several recent doctrine PRs were guesses. Without comparison runs we don't know which helped. If hypothesis 4 confirms the ceiling is real, we stop prompt-only fixes for compliance and consider programmatic enforcement.
Acceptance
Sequencing
Blocked by #131.