v1.9.4
MINOR release. Closes 21 of 22 prioritized fixes from a 5-scenario
fresh-agent usability validation (Claude Opus 4.7, 1M ctx, doc-surface
only across biology RNA-seq, humanities close-reading, qualitative
interviews, engineering microbenchmark, theory/math proof). Average
usability rating moved 6.6 / 10 → 7.8 / 10; HIGH-severity friction
12 → 1; first-5-turns HIGH friction 2 → 0. One new protocol
(methodology/qualitative_pii_redaction), one new schema field
(next_protocol_kind on all 148 protocols), citation_style enum
widened, two new Typst venue templates. No public-API tool removed.
No tool's existing input schema changed.
Full detail: docs/USABILITY_v1.9.4.md
(synthesis), docs/usability_v1.9.4/scenario_{1..5}*.md (per-scenario
trace + re-run reports), and the "Validated in v1.9.4" appendix in
docs/AUDIT_v1.9.2.md.
Release gates: preflight 23/23 (new check: next_protocol_kind declared on every protocol); pytest 899 passed (was 896
baseline; +3 from new Typst venue parametrisations for
humanities_essay + chicago_thesis); ruff clean.
Added
docs/USABILITY_v1.9.4.md— 5-scenario fresh-agent validation
synthesis (165 turns logged, friction matrix, cross-scenario
themes, 22-fix priority list, deferred items, full re-validation
results table).- 5 per-scenario reports under
docs/usability_v1.9.4/
(initial + re-run for biology / humanities / qualitative / theory;
initial only for engineering). - New protocol:
methodology/qualitative_pii_redaction.yaml—
HIPAA Safe Harbor 18-class + GDPR Art. 9 + IRB-compliant
pre-coding gate. Hard prerequisite ofmethodology/qualitative_research.
Routes via_router_index.yamlwhen raw transcripts present without
redacted counterpart (F-017). - New schema field:
next_protocol_kindon every protocol YAML
(forward_default|iterate_back|terminal). Backfilled across
all 148 protocols (base + 5 packs) via inference. Documented in
PROTOCOL_DOCTRINE.md(F-007). Soft preflight check added. step_intentfield intemplates/step_summary.yaml.template
(plan / ground / analyse / visualise / synth / proof / apparatus).
Per-step audit waivers documented per intent class (F-001).- 2 new Typst venue templates:
humanities_essay.typ
(single-column, footnotes, block-quote macro, generous margins) +
chicago_thesis.typfor humanities + Chicago-citation outputs
(F-018). Registered inVENUE_TEMPLATES+VENUE_CITATION_STYLE. - PII redaction policy template:
templates/qualitative/pii_policy.md
(F-017 supporting material). - End-to-end recipes table in
docs/USE_CASES.md(qualitative
pipeline, ML benchmark, theory/math proof, humanities essay,
viz-only). Plus "Common first prompts (start here)" table covering
data+hypothesis, text corpus, interview transcripts, benchmark
vocabulary, conjecture-to-prove, mid-pipeline, unclear-intent
(F-008, F-021). - Appendix A — Common figure recipes in
docs/RESEARCHER_GUIDE.md(volcano / UMAP / heatmap / forest /
survival KM / log-log benchmark) mapping each to its protocol
stack + enforced sidecar/audit conventions (F-016). - Theory_math pack surfaced in 4 user-facing docs:
docs/USE_CASES.md(theorist row, 8 protocols + 3 tools),
docs/PROTOCOLS.md(8-protocol section),docs/START.md(theory +
qualitative + humanities first prompts),docs/AI_GUIDE.md(full
domain-packs section with theory_math workflow) (F-014). - Return-shape JSON examples in
docs/TOOLS.mdfor
tool_intake_autofill,tool_dashboard_create,tool_step_complete,
tool_audit_quality_full(F-013). Cited by the biology re-validation
as "single highest-leverage doc choice; lets a fresh agent simulate
calls without grepping src/".
Improved (AI guidance prose across protocols)
Scenarios improved: biology RNA-seq DE (S1), humanities close-reading
(S2), qualitative interviews (S3), engineering benchmark (S4),
theory/math proof (S5) — all 5.
methodology/qualitative_research.yaml—next_protocolfixed
fromguidance/analysis_plan→methodology/qualitative_quality_audit
(F-006).ingest_transcriptsstep now STOPs and routes when raw
transcripts lack redacted counterpart (F-017).declare_step_contract
step added per F-002 (figure-gate auto-waiver).methodology/method_comparison.yaml— engineering / systems
benchmark addendum step added (warm-up runs vs folds, CPU governor
control, paired Wilcoxon on heavy-tailed timings, log-log scaling
plots, language-stdlib baselines, requirements-traceability binding)
(F-010).guidance/analysis_plan.yaml—classify_step_intentstep added
at step-create time; visualise-step literature exemption via
literature.inherits_fromdocumented (F-001, F-004).literature/literature_per_step.yaml— verdict enum extended
to AGREES | DISAGREES | EXTENDS | IMPORTED_AS_CITED |
SPECIALIZES | DEFERRED. Verdict-selection guide added.
Visualise-step inheritance contract documented (F-003, F-004).research_os_humanities/protocols/textual/close_reading.yaml—
declare_step_contractstep added (apparatus contract waives
generic completeness gate);tool_humanities_apparatus_audit
cross-link (F-002, F-020).research_os_theory_math/protocols/proof/proof_verification_workflow.yaml—
step_intent: proofcontract declaration (F-002).
Improved (error messages + tool surface)
tool_dashboard_create—modeenum (explore / story /
executive / teaching) enumerated in TOOLS.md; composition with
audience=documented; story-mode dependency ondashboard_story.md
surfaced (F-011).tool_step_complete— first-class TOOLS.md entry with gate
sequence, return shape, and alias-superset relationship to
tool_path_finalize(F-012).tool_engineering_requirements_matrix— cross-referenced from
method_comparisonengineering / systems-benchmark addendum (F-009).tool_redteam_review— row added to TOOLS.md (Audit extensions)
clarifyingfocus=values and distinction fromquick_paper_review
andpeer_review_response(F-015).
Improved (onboarding flow)
docs/START.md— extrainputs/subfolders table after the
file-drop section; pointer to validated first-prompts table in
USE_CASES.md; theory + qualitative + humanities first prompts
(F-021, F-022).docs/AI_GUIDE.md—discover/clarified as shortcut-tool-only
intent_class (no FS folder, stops fresh agents grepping src/);
inputs/directory conventions table;chat_split_recommended
heuristic permodel_profile; full domain-packs section (F-014,
F-022, C-extras).docs/RESEARCHER_GUIDE.md— extra-subfolders table after file-layout
diagram; inline citation_style + venue_template comments flagging
humanities/math gaps and workarounds; Appendix A common figure
recipes (F-016, F-022).docs/FAQ.md— text-corpus-vs-transcripts file placement;
theory-math support discoverability; humanities pack support +
monograph citation gotcha; qualitative end-to-end chain +
saturation-not-power-analysis (F-022, C-extras).
Fixed (edge cases)
- Per-step audits now intent-aware — F-001 + F-002 ship the
step_intentcontract; figure-required hard-fail auto-waives for
plan / ground / proof / apparatus / synth steps. Drove 5/5 scenarios'
per-step-audit over-fire to zero. - Literature gate verdict enum gap closed — F-003 extends enum
with IMPORTED_AS_CITED + SPECIALIZES, closing theory's 9-HIGH
literature-gate verdict-mismatch cluster. F-004 documents the
visualise-step exemption vialiterature.inherits_from. qualitative_research.next_protocolmis-route fixed — F-006
one-line YAML fix.next_protocolsemantic ambiguity resolved — F-007 backfills
next_protocol_kindon all 148 protocols.citation_styleenum widened — F-018 adds mla,
chicago_author_date, chicago_notes_bib, amsplain, siam (mirrored
inCONFIG_TEMPLATE+VENUE_TEMPLATES+VENUE_CITATION_STYLE
with researcher-facing → Typst hyphenated CSL translator).
Fixed (per-domain composition gaps)
- Qualitative — pre-coding PII redaction protocol now exists
upstream of coding (was: only quote-level audit AFTER coding, too
late for HIPAA/IRB/GDPR). Most material protective gap closed (F-017). - Humanities —
citation_styleMLA / Chicago +humanities_essay.typchicago_thesis.typTypst templates shipped;humanities_apparatus_audit
cross-linked fromclose_reading(F-018, F-020). Two HIGH frictions
removed.
- Theory/math — pack surfaced in user-facing docs (F-014); paper
rating moved 5 → 8. - Engineering —
method_comparisongains
engineering/systems-benchmark addendum + cross-link to
tool_engineering_requirements_matrix(F-009, F-010).
Deferred to v1.11.0
- F-005 — Per-step audit override path documentation
(override_completeness_gate,override_literature_gate) with
examples in TOOLS.md + AI_GUIDE.md. - F-019 — WorldCat / OpenLibrary / LOC ISBN-based verifiers in
tool_citations_verify(humanities monograph DOI gap). Doc-side
workaround language landed in v1.9.4. - Humanities essay structure protocol parallel to
theory_math/output/theory_paper_structure—humanities_essay.typ
ships but no protocol drives it. tool_audit_step_literaturedescriptive/prep step waiver —
partial via F-003; full descriptive-step waiver still open
(continuation of AUDIT-v1.9.2-022).- D-01 .. D-07 — pack-aware
tool_audit_prose, theory dashboard
schema, LLM-assisted qualitative coding tool, informal-markdown
proof parser fortool_theory_math_dep_graph,
chat_split_recommendedheuristic exposure, router decomposition
algorithm exposure,single_coderbranch in
coding_scheme_development.
Validation metrics
- Average usability rating: 7.8 / 10 (vs 6.6 / 10 initial baseline; +1.2)
- HIGH-severity friction events: 1 (vs 12 initial; −11)
- Onboarding HIGH friction (first 5 turns): 0 (vs 2 initial)
- Scenarios reaching
paper.pdfstep: 5 / 5 - Scenarios reaching
dashboard.htmlstep: 5 / 5 (was 5 / 5 with 1 partial) - Top scenario movement: theory/math 5 → 8 (+3) on F-014 + F-001/F-002 + F-003
Targets met: HIGH ≤ 5 (1) ✓; onboarding HIGH = 0 ✓. Target missed:
average ≥ 8.5 (got 7.8); concentrated in S2 humanities where the
missing humanities_essay_structure protocol and the still-empirical
descriptive/prep literature-verdict gap account for the 0.7-point gap.
Bumped — protocols
148 protocol YAMLs (base + 5 packs) gained next_protocol_kind field
via scripted backfill (inferred: null → terminal, self-id →
iterate_back, otherwise → forward_default). Embeddings rebuilt
(151 protocols + 212 tools, BAAI/bge-small-en-v1.5, dim=384).