v4.2.1 — Honest Reframing: Simulation is Prior, Not Result
An adversarial peer-review pass of v4.2.0 (conducted as part of internal QA before any external review) identified four critical methodological errors in how simulation outputs were being presented. All four are real. This release retracts those claims publicly rather than silently rewriting them.
Retracted from v4.2.0
| Claim | Why it was wrong |
|---|---|
| "Mean ASR" labels on simulation outputs presented as findings | The simulation re-states a hand-tuned prior (MODEL_BASE_ASR, CATEGORY_MULTIPLIERS in evaluate_phase2b.py). Running it under different seeds restates the prior — it does not measure model behaviour. |
| "95% bootstrap CIs" on the seed-mean ranges | scripts/multi_seed.py ci95() returned min/max of seed means, not bootstrap CIs. Function renamed to seed_range(), all docs corrected. |
| "Claude Opus 4-8 produces zero Tier-3 outcomes — the simulation's most testable prediction" | Arithmetic floor: severity-3 gate requires effective_prob > 0.9; Opus max effective_prob = 0.07 × 9.0 = 0.63. Impossible by construction. Reframed as a property of the parameterization, not a prediction. |
| "Cross-model differences statistically significant for 5 of 10 categories" | Cochran's Q requires matched subjects. The simulation produces independent random draws per (model, pattern, trial). Test computable but p-values not interpretable on simulated data. |
Where the retractions land
README.md— Phase 2b section rewritten with prominent disclaimer blockpaper/research-paper.md— "Headline empirical outputs" section retracted and rewrittenfindings/v4_simulation_findings.md— original 7 findings retracted; document is now an honest retraction notice explaining what the simulation is and isn'tpaper/anthropic_alignment_with_taxonomy.md— editorial language removed; rewritten as neutral comparison without insider-judgment framingevaluate_phase2b.py— module docstring leads with parameterized-risk-model framingscripts/multi_seed.py—ci95→seed_range(legacy alias preserved)scripts/statistical_tests.py— explicit assumption-violation caveats added
Why retract publicly instead of silently rewriting
- Audit trail. Anyone reading git log can see exactly what was claimed in v4.2.0 and what was retracted in v4.2.1.
- Research maturity signal. Self-correction under adversarial review is more credible than the appearance of never having erred.
- Pattern reuse. The same retraction discipline will apply when live Phase 2b data inevitably surfaces something different from the prior. Establishing the protocol now prevents drift later.
Unchanged
- 40-pattern taxonomy and mechanism-to-alignment-assumption mapping
- 17 cited papers (all direct-WebFetch verified in v4.0.1 — that audit stands)
- Engineering infrastructure (PEP 621, Docker, CI on Python 3.10/3.11/3.12, 10/10 pytest)
- Phase 2b live harness (
evaluate_live.py) - Phase 3 defense framework spec
- Reproducibility checklist, Datasheet, Ethics statement
What this release strengthens
The Phase 2b live execution request becomes more clearly motivated, not less. The retractions make explicit that what v4.2.0 mis-labeled as "findings" was actually a predicted shape from prior literature. The live run produces the empirical data that would confirm the prior, reject it, or surface novel structure.
That is the work the $1,000 credit allocation funds.