Releases: vukkt/token-warden
v0.17.0
Quality hardening — no plugin behavior change; this release is about making the
codebase provably tested and tight, with CI guards that can't silently slip.
- 90% line coverage (78% branch), CI-gated. Added
@vitest/coverage-v8with
a ratchet-floor threshold; the newcoveragepipeline stage fails the build on
any regression. Coverage rose from ~66% to 90% by unit-testing the
subprocess/stdin CLIs (collect,gate,distill,evolve,modelbench,
promptbench) with mockedchild_process/stdin boundaries — real orchestration
tests (fail-open contracts, verdict decisions, anomaly alerts), not padding. The
untestableinvokedDirectlyentry shims are honestly excluded viav8 ignore. - Dead-code gate.
knip(unused files/exports/deps) is wired into CI and the
module API surface was tightened (8 internal-only exports un-exported). Zero
unused SQL fields. - Component-integration + performance tests.
test/integration.test.tswires
the real modules end-to-end (collection → distill trigger → selector → receipts
→ status) through one DB;test/perf.test.tsholds hot-path budgets — transcript
parser ~39 MB/s (2 MB in ~50 ms vs the 2 s Stop-hook budget), 50k tool events
attributed in ~24 ms, a 2k-session rollup in ~1.3 ms. - 361 tests, green on Node 22 and 24.
v0.16.0
Rule receipts — the per-rule verdict card (community-suggested).
- New
/warden-receiptcommand (npx tsx src/receipt.ts [--agent <name>] [--json]) renders the evidence behind each keep/evict decision as one card:
token savings vs. context rent (with variance and ROI multiple), the model and
golden-suite hash it was measured under, per-task pass/fail with vs. without
the rule, and the tool-call / file-reread activity profile with vs. without
(shown as a signed % so a reviewer can see whether a "cheap" rule did less
work). Read-only; the natural payload for sharing a rule — "my delta is
evidence, not authority for your repo." - The selector now records a receipt snapshot (
rule_receiptstable, migration
#9) at every decision — initial and each re-audit, so a rule has an audit
trail. The keep/evict verdict logic is unchanged; receipts are additive
capture.RunResultnow carries tool-call / file-reread counts;bench.ts
gainsgoldenSuiteHashfor suite provenance. - The safety axis is surfaced, not auto-judged: a big activity drop is usually
the point of an efficiency rule, so the receipt shows the numbers and leaves
the call to a human — the binding safety gate remains the per-task pass/fail
regression, which evicts on its own. - 292 tests, green on Node 22 and 24.
v0.15.0
Tooling and docs — no plugin behavior change.
- Staged CI/CD pipeline.
.github/workflows/ci.ymlis now a dependent-stage
pipeline —quality(lint, typecheck, manifest version consistency) →
test(Node 22 + 24) andfixturein parallel →validate(plugin-manifest
validation + a CLI smoke run) →release. Thereleasestage runs only on a
vX.Y.Ztag: it verifies the tag matches the manifests and publishes the
GitHub release with notes fromCHANGELOG.md. Tag-push is now the whole
deploy step. - Release helper scripts (
scripts/check-versions.mjs,
scripts/changelog-section.mjs) — version-consistency guard and changelog
extraction, reused by CI and runnable locally (npm run check:versions). - Standard project docs:
CONTRIBUTING.md(setup, the pipeline, the release
flow, the design invariants) andSECURITY.md(reporting + the security
model). README gains a Quickstart at the top of "Getting started". - A professional sweep of every source file found it clean (no TODO/FIXME, no
any, no stray debug, no non-text bytes). 275 tests, green on Node 22 and 24.
v0.14.1 — verdict-math boundary tests
Test-only hardening — no behavior or API change.
- Locked the
assessDeltadegenerate-input boundaries that protect a keep/evict verdict from a divide-by-zeroNaN: a single comparable task yields a finite point estimate with null standard error (thesavings.length >= 2guard), and no comparable task yields a null delta rather thanNaN. An audit confirmed the verdict math is otherwise free of divide-by-zero /NaNpaths. - 275 tests, green on Node 22 and 24.
Full changelog: v0.14.0...v0.14.1
v0.14.0 — gate injection hardening + hygiene
Hardening and simplification release. No new commands; existing behavior is unchanged except the inter-agent approval prompt is now injection-proof. Bundles the work from a focused optimization pass over the codebase.
Security
gate.tsapproval prompt is sanitized. The PreToolUse prompt for an inter-agentSendMessageinterpolated the sender, recipient, and message body. A hostile teammate message could embed ANSI/control sequences to forge or obscure the line the user approves. Every interpolated field now passes through the shared sanitizer (control/ANSI stripped, names capped); the forged-newline and escape vectors are closed. Verified end-to-end.
Cleanups
- New
src/sanitize.ts—displayTextextracted into one presentation-security chokepoint used bystatus,compare,attribute, andgate;attribute/compareno longer pull it from the heavierstatusmodule. - Fixed NUL bytes in
attribute.ts(a NUL-delimited map key) — invisible and tool-breaking; replaced with a collision-proofJSON.stringifykey. Newsource-hygienetest fails the build on any NUL/control byte in source. - Centralized the run-total token SQL (
RUN_TOTAL_TOKENS_SQL, was hand-written 10×) and collapsed the duplicated candidate/re-audit verdict path inselect.tsinto one helper — both behavior-preserving. - Added
parseAgentDefinitionmemory-scope-isolation tests (benchmarks never touch real agent-memory).
Verification
273 tests, green on Node 22 and 24. E2e edge-case sweep confirmed fail-open on the collect/gate hooks (empty, garbage, binary, missing-file inputs) and correct exit codes across every CLI.
Full changelog: v0.13.0...v0.14.0
v0.13.0 — skill/MCP cost attribution (#5 complete)
Roadmap direction #5 — skill/MCP cost attribution is complete. Decomposition, not a verdict: it answers "where did the tokens go?" by attributing each real-work session's footprint to the tool, skill, or MCP server that produced it. Fully orthogonal to the selector/benchmark path — it never promotes, evicts, or measures a rule.
What's new
npx tsx src/attribute.ts(new/warden-attributecommand) renders a cross-session rollup of tool/skill/MCP cost, or a single transcript with--transcript. Filters:--agent,--kind builtin|mcp|skill,--limit,--json.transcript.tsnow joins eachtool_useto itstool_resultby id in the existing single streaming pass, capturing the input chars the model generated and the result chars the tool injected back into context. The hot Stop-hook budget is unchanged (one pass, O(tool calls)).db.tsmigration #8 adds atool_coststable;collect.tspersists per-session costs inside the existing fail-open block (real-work only — golden runs are never attributed)./warden-statusgains a top-costs section.- Footprint is measured in characters (exact, deterministic); a rough ≈tokens figure (chars ÷ 4) is shown for intuition, not as a billed token count.
Hardening
From an adversarial review: a tool_result content array with an odd sibling (a bare string, an image block) no longer zeroes the whole result's footprint — each element is read defensively. A cross-feature regression audit confirmed fail-open is preserved, the verdict path is untouched, and the migration is additive.
219 tests, green on Node 22 and 24.
Roadmap
Of the six directions, #1, #2, #3, #4, #5 (plus automated prompt evolution) are shipped. Only #6 (rule marketplaces) remains.
Full changelog: v0.12.0...v0.13.0
v0.12.0 — team-shared rule ledgers complete (CI gate)
Roadmap #3, increment 3: the CI gate — #3 is now complete.
What's new
npx tsx src/verify-ledger.ts [file...]validates committed.warden/*.rules.mdledgers and exits non-zero if any is corrupt or hand-edited, so a CI job can gate the PR. Deterministic and offline — no model tokens, no secrets; reuses increment 2'sparseLedgerFile. Verified: a valid ledger passes (exit 0), a corrupted one fails (exit 1).- A deeper gate that re-benchmarks each rule's claimed delta in CI is possible but needs a model-token budget and credentials, so it is a documented deployment choice rather than a default.
Team-shared rule ledgers — the full arc
/warden-share (export) → /warden-adopt (import as candidates, re-measured locally, foreign delta never trusted) → verify-ledger (CI structural gate). Memory review becomes code review.
180 tests, green on Node 22 and 24.
Roadmap
Of the six directions, #1 (model-bench), #2 (prompt A/B), #3 (team-shared ledgers), #4 (cost-anomaly alerting) are shipped, plus automated prompt evolution. Two remain: #5 skill/MCP cost attribution and #6 rule marketplaces.
Full changelog: v0.11.0...v0.12.0
v0.11.0 — team-shared rule ledgers (import + re-verify)
Roadmap #3, increment 2: import a shared ledger and re-verify each rule locally — never trusting a foreign delta.
What's new
/warden-adopt --from <path>(andsrc/adopt.ts) reads a shared ledger (from/warden-share) and queues its rules as candidates locally. The foreign measured delta is discarded and the context rent is recomputed locally, so by invariant #1 an adopted rule is never injected into memory until the local selector re-measures it on this machine's golden suite. Near-duplicates of any existing rule (active/candidate/evicted) are skipped; re-adopting is idempotent.
Why this is safe
This is the increment that writes to the rule ledger, so it was the one to handle carefully. The safety comes from a single decision: an adopted rule is just a candidate, so it flows through the existing selector untouched — the whole variance-conservative verdict path re-measures it on your own suite. There is no new trust path. The ledger JSON is zod-validated; control-char bodies and malformed blocks are rejected.
Verified end-to-end: real rules exported → adopted into a fresh DB as candidates (rent recomputed, foreign delta discarded) → re-adopt correctly skipped as duplicates. 174 tests, green on Node 22 and 24.
Full changelog: v0.10.0...v0.11.0
v0.10.0 — team-shared rule ledgers (export)
Roadmap #3, increment 1: export measured rules to a committed, reviewable artifact so a team can version and review agent memory like code.
What's new
/warden-share <agent>(andsrc/share.ts) writes an agent's active rules — body, measured token delta, context rent, and provenance — to.warden/<agent>.rules.md: a human-readable bullet list plus a machine-readable JSON block that round-trips. A PR adding a rule arrives with its proof, and a later import can re-verify it.- Read-only and zero-coupling by design. It only reads the rule ledger and writes a file, so it cannot affect the collect/distill/select loop. Verified: nothing else imports it, and it carries the invoked-directly guard from the start.
Why export first
The risky part of team-shared ledgers is import, not export — a foreign rule claims a measured delta, and the project's one inviolable rule is "measured, not claimed." So import must re-pass the importer's own golden suite (reusing the existing selector; no new trust path). Shipping export first locks the artifact format before any import logic touches the selector.
166 tests, all green on Node 22 and 24.
Full changelog: v0.9.1...v0.10.0
v0.9.1 — documentation fixes
Documentation-only release (no code changes).
- Roadmap de-drifted. Model-migration benchmarking, prompt A/B testing, and cost-anomaly alerting were still listed as future "bigger directions" while already shipped (v0.5/v0.6/v0.9). Removed them, and collapsed the ever-growing "shipped since v0.1.0" list into a one-line pointer to the CHANGELOG — the canonical record of what shipped — so the two stop drifting.
- Testing section wording corrected: the CI badge shows pass/fail, not a test count; the prose now gives an approximate count and says so.
Full changelog: v0.9.0...v0.9.1