Releases · vukkt/token-warden

16 Jun 13:18

v0.17.0

eb964a4

v0.17.0 Latest

Latest

Quality hardening — no plugin behavior change; this release is about making the
codebase provably tested and tight, with CI guards that can't silently slip.

90% line coverage (78% branch), CI-gated. Added @vitest/coverage-v8 with
a ratchet-floor threshold; the new coverage pipeline stage fails the build on
any regression. Coverage rose from ~66% to 90% by unit-testing the
subprocess/stdin CLIs (collect, gate, distill, evolve, modelbench,
promptbench) with mocked child_process/stdin boundaries — real orchestration
tests (fail-open contracts, verdict decisions, anomaly alerts), not padding. The
untestable invokedDirectly entry shims are honestly excluded via v8 ignore.
Dead-code gate. knip (unused files/exports/deps) is wired into CI and the
module API surface was tightened (8 internal-only exports un-exported). Zero
unused SQL fields.
Component-integration + performance tests. test/integration.test.ts wires
the real modules end-to-end (collection → distill trigger → selector → receipts
→ status) through one DB; test/perf.test.ts holds hot-path budgets — transcript
parser ~39 MB/s (2 MB in ~50 ms vs the 2 s Stop-hook budget), 50k tool events
attributed in ~24 ms, a 2k-session rollup in ~1.3 ms.
361 tests, green on Node 22 and 24.

Assets 2

16 Jun 12:17

github-actions

v0.16.0

e47808e

v0.16.0

Rule receipts — the per-rule verdict card (community-suggested).

New /warden-receipt command (npx tsx src/receipt.ts [--agent <name>] [--json]) renders the evidence behind each keep/evict decision as one card:
token savings vs. context rent (with variance and ROI multiple), the model and
golden-suite hash it was measured under, per-task pass/fail with vs. without
the rule, and the tool-call / file-reread activity profile with vs. without
(shown as a signed % so a reviewer can see whether a "cheap" rule did less
work). Read-only; the natural payload for sharing a rule — "my delta is
evidence, not authority for your repo."
The selector now records a receipt snapshot (rule_receipts table, migration
#9) at every decision — initial and each re-audit, so a rule has an audit
trail. The keep/evict verdict logic is unchanged; receipts are additive
capture. RunResult now carries tool-call / file-reread counts; bench.ts
gains goldenSuiteHash for suite provenance.
The safety axis is surfaced, not auto-judged: a big activity drop is usually
the point of an efficiency rule, so the receipt shows the numbers and leaves
the call to a human — the binding safety gate remains the per-task pass/fail
regression, which evicts on its own.
292 tests, green on Node 22 and 24.

Assets 2

16 Jun 10:09

github-actions

v0.15.0

eeb3963

v0.15.0

Tooling and docs — no plugin behavior change.

Staged CI/CD pipeline. .github/workflows/ci.yml is now a dependent-stage
pipeline — quality (lint, typecheck, manifest version consistency) →
test (Node 22 + 24) and fixture in parallel → validate (plugin-manifest
validation + a CLI smoke run) → release. The release stage runs only on a
vX.Y.Z tag: it verifies the tag matches the manifests and publishes the
GitHub release with notes from CHANGELOG.md. Tag-push is now the whole
deploy step.
Release helper scripts (scripts/check-versions.mjs,
scripts/changelog-section.mjs) — version-consistency guard and changelog
extraction, reused by CI and runnable locally (npm run check:versions).
Standard project docs: CONTRIBUTING.md (setup, the pipeline, the release
flow, the design invariants) and SECURITY.md (reporting + the security
model). README gains a Quickstart at the top of "Getting started".
A professional sweep of every source file found it clean (no TODO/FIXME, no
any, no stray debug, no non-text bytes). 275 tests, green on Node 22 and 24.

Assets 2

16 Jun 09:41

vukkt

v0.14.1

1752b37

v0.14.1 — verdict-math boundary tests

Test-only hardening — no behavior or API change.

Locked the assessDelta degenerate-input boundaries that protect a keep/evict verdict from a divide-by-zero NaN: a single comparable task yields a finite point estimate with null standard error (the savings.length >= 2 guard), and no comparable task yields a null delta rather than NaN. An audit confirmed the verdict math is otherwise free of divide-by-zero / NaN paths.
275 tests, green on Node 22 and 24.

Full changelog: v0.14.0...v0.14.1

Assets 2

16 Jun 09:32

vukkt

v0.14.0

8c79c5c

v0.14.0 — gate injection hardening + hygiene

Hardening and simplification release. No new commands; existing behavior is unchanged except the inter-agent approval prompt is now injection-proof. Bundles the work from a focused optimization pass over the codebase.

Security

gate.ts approval prompt is sanitized. The PreToolUse prompt for an inter-agent SendMessage interpolated the sender, recipient, and message body. A hostile teammate message could embed ANSI/control sequences to forge or obscure the line the user approves. Every interpolated field now passes through the shared sanitizer (control/ANSI stripped, names capped); the forged-newline and escape vectors are closed. Verified end-to-end.

Cleanups

New src/sanitize.ts — displayText extracted into one presentation-security chokepoint used by status, compare, attribute, and gate; attribute/compare no longer pull it from the heavier status module.
Fixed NUL bytes in attribute.ts (a NUL-delimited map key) — invisible and tool-breaking; replaced with a collision-proof JSON.stringify key. New source-hygiene test fails the build on any NUL/control byte in source.
Centralized the run-total token SQL (RUN_TOTAL_TOKENS_SQL, was hand-written 10×) and collapsed the duplicated candidate/re-audit verdict path in select.ts into one helper — both behavior-preserving.
Added parseAgentDefinition memory-scope-isolation tests (benchmarks never touch real agent-memory).

Verification

273 tests, green on Node 22 and 24. E2e edge-case sweep confirmed fail-open on the collect/gate hooks (empty, garbage, binary, missing-file inputs) and correct exit codes across every CLI.

Full changelog: v0.13.0...v0.14.0

Assets 2

16 Jun 07:56

vukkt

v0.13.0

5779562

v0.13.0 — skill/MCP cost attribution (#5 complete)

Roadmap direction #5 — skill/MCP cost attribution is complete. Decomposition, not a verdict: it answers "where did the tokens go?" by attributing each real-work session's footprint to the tool, skill, or MCP server that produced it. Fully orthogonal to the selector/benchmark path — it never promotes, evicts, or measures a rule.

What's new

npx tsx src/attribute.ts (new /warden-attribute command) renders a cross-session rollup of tool/skill/MCP cost, or a single transcript with --transcript. Filters: --agent, --kind builtin|mcp|skill, --limit, --json.
transcript.ts now joins each tool_use to its tool_result by id in the existing single streaming pass, capturing the input chars the model generated and the result chars the tool injected back into context. The hot Stop-hook budget is unchanged (one pass, O(tool calls)).
db.ts migration #8 adds a tool_costs table; collect.ts persists per-session costs inside the existing fail-open block (real-work only — golden runs are never attributed). /warden-status gains a top-costs section.
Footprint is measured in characters (exact, deterministic); a rough ≈tokens figure (chars ÷ 4) is shown for intuition, not as a billed token count.

Hardening

From an adversarial review: a tool_result content array with an odd sibling (a bare string, an image block) no longer zeroes the whole result's footprint — each element is read defensively. A cross-feature regression audit confirmed fail-open is preserved, the verdict path is untouched, and the migration is additive.

219 tests, green on Node 22 and 24.

Roadmap

Of the six directions, #1, #2, #3, #4, #5 (plus automated prompt evolution) are shipped. Only #6 (rule marketplaces) remains.

Full changelog: v0.12.0...v0.13.0

Assets 2

15 Jun 18:17

vukkt

v0.12.0

26efd3c

v0.12.0 — team-shared rule ledgers complete (CI gate)

Roadmap #3, increment 3: the CI gate — #3 is now complete.

What's new

npx tsx src/verify-ledger.ts [file...] validates committed .warden/*.rules.md ledgers and exits non-zero if any is corrupt or hand-edited, so a CI job can gate the PR. Deterministic and offline — no model tokens, no secrets; reuses increment 2's parseLedgerFile. Verified: a valid ledger passes (exit 0), a corrupted one fails (exit 1).
A deeper gate that re-benchmarks each rule's claimed delta in CI is possible but needs a model-token budget and credentials, so it is a documented deployment choice rather than a default.

Team-shared rule ledgers — the full arc

/warden-share (export) → /warden-adopt (import as candidates, re-measured locally, foreign delta never trusted) → verify-ledger (CI structural gate). Memory review becomes code review.

180 tests, green on Node 22 and 24.

Roadmap

Of the six directions, #1 (model-bench), #2 (prompt A/B), #3 (team-shared ledgers), #4 (cost-anomaly alerting) are shipped, plus automated prompt evolution. Two remain: #5 skill/MCP cost attribution and #6 rule marketplaces.

Full changelog: v0.11.0...v0.12.0

Assets 2

15 Jun 18:11

vukkt

v0.11.0

8587174

v0.11.0 — team-shared rule ledgers (import + re-verify)

Roadmap #3, increment 2: import a shared ledger and re-verify each rule locally — never trusting a foreign delta.

What's new

/warden-adopt --from <path> (and src/adopt.ts) reads a shared ledger (from /warden-share) and queues its rules as candidates locally. The foreign measured delta is discarded and the context rent is recomputed locally, so by invariant #1 an adopted rule is never injected into memory until the local selector re-measures it on this machine's golden suite. Near-duplicates of any existing rule (active/candidate/evicted) are skipped; re-adopting is idempotent.

Why this is safe

This is the increment that writes to the rule ledger, so it was the one to handle carefully. The safety comes from a single decision: an adopted rule is just a candidate, so it flows through the existing selector untouched — the whole variance-conservative verdict path re-measures it on your own suite. There is no new trust path. The ledger JSON is zod-validated; control-char bodies and malformed blocks are rejected.

Verified end-to-end: real rules exported → adopted into a fresh DB as candidates (rent recomputed, foreign delta discarded) → re-adopt correctly skipped as duplicates. 174 tests, green on Node 22 and 24.

Full changelog: v0.10.0...v0.11.0

Assets 2

15 Jun 13:25

vukkt

v0.10.0

434446f

v0.10.0 — team-shared rule ledgers (export)

Roadmap #3, increment 1: export measured rules to a committed, reviewable artifact so a team can version and review agent memory like code.

What's new

/warden-share <agent> (and src/share.ts) writes an agent's active rules — body, measured token delta, context rent, and provenance — to .warden/<agent>.rules.md: a human-readable bullet list plus a machine-readable JSON block that round-trips. A PR adding a rule arrives with its proof, and a later import can re-verify it.
Read-only and zero-coupling by design. It only reads the rule ledger and writes a file, so it cannot affect the collect/distill/select loop. Verified: nothing else imports it, and it carries the invoked-directly guard from the start.

Why export first

The risky part of team-shared ledgers is import, not export — a foreign rule claims a measured delta, and the project's one inviolable rule is "measured, not claimed." So import must re-pass the importer's own golden suite (reusing the existing selector; no new trust path). Shipping export first locks the artifact format before any import logic touches the selector.

166 tests, all green on Node 22 and 24.

Full changelog: v0.9.1...v0.10.0

Assets 2

15 Jun 08:20

vukkt

v0.9.1

3ab22b3

v0.9.1 — documentation fixes

Documentation-only release (no code changes).

Roadmap de-drifted. Model-migration benchmarking, prompt A/B testing, and cost-anomaly alerting were still listed as future "bigger directions" while already shipped (v0.5/v0.6/v0.9). Removed them, and collapsed the ever-growing "shipped since v0.1.0" list into a one-line pointer to the CHANGELOG — the canonical record of what shipped — so the two stop drifting.
Testing section wording corrected: the CI badge shows pass/fail, not a test count; the prose now gives an approximate count and says so.

Full changelog: v0.9.0...v0.9.1

Assets 2

Releases: vukkt/token-warden

v0.17.0

Uh oh!

v0.16.0

Uh oh!

v0.15.0

Uh oh!

v0.14.1 — verdict-math boundary tests

Uh oh!

v0.14.0 — gate injection hardening + hygiene

Security

Cleanups

Verification

Uh oh!

v0.13.0 — skill/MCP cost attribution (#5 complete)

What's new

Hardening

Roadmap

Uh oh!

v0.12.0 — team-shared rule ledgers complete (CI gate)

What's new

Team-shared rule ledgers — the full arc

Roadmap

Uh oh!

v0.11.0 — team-shared rule ledgers (import + re-verify)

What's new

Why this is safe

Uh oh!

v0.10.0 — team-shared rule ledgers (export)

What's new

Why export first

Uh oh!

v0.9.1 — documentation fixes

Uh oh!