v0.23.0
[0.23.0] - 2026-06-10
Schema Compatibility
- Two new schemas added (backward compatible, additive only):
golden-trace-case(.harness/schemas/golden-trace-case.schema.json) and
structural-test-result(.harness/schemas/structural-test-result.schema.json).
Both are registered in the stable schema policy (now 17 contracts). No
existing schema changed.
Added
migrate-settingscommand (aliasmigrate-config): repairs older
installs — moves legacyharness.config.json, restores the
Edit/Write/MultiEdit baseline permissions, merges unmerged hooks and
compiled permission hints into.claude/settings.json. Doctor now fails
with the remediation command when it detects this drift.- Golden-trace regression gate (
check:golden-traces): 7 recorded cases
replay deterministic harness surfaces hermetically and exit 2 naming the
case and assertion when behavior regresses; wired into the generated CI
workflow. - Structural-test
--jsonon all six language adapters (TypeScript,
Python, Go, Rust, Swift, Kotlin) emitting onestructural-test-result
object for dashboards and PR annotations. - Loop detection (PostToolUse): repeated edits to one file inside a
window inject a doom-loop warning and aloop_detectedtelemetry row. - Symbol index (
symbol-index.mjs): where-is-X navigation to file:line
across 8 languages with incremental PostToolUse updates. - Context budget (
context-budget.mjs+contextBudget.totalChars):
explicit per-source allocation for SessionStart context with line-boundary
truncation and a usage report. - Observational memory (
observation-compress.mjs): PreCompact and
SessionEnd compress transcripts into dense observations; SessionStart
re-injects the three most recent. - MCP skill server (
mcp-skill-server.mjs): dependency-free MCP stdio
server exposinglist_skills/load_skillto any MCP client. - Terrain-correct value-prop benches: eval tasks 05–07 (weak-model,
greenfield, adversarial) plusdocs/benchmarks/value-prop-protocol.md
replacing the 1-shot A/B design that returned five consecutive nulls. - Doctor checks for
eslint-plugin-boundariespresence/version and for
settings drift (baseline permissions, unmerged hooks).
Fixed
- Codex advisor runner retries transient failures (timeout, non-zero
exit) with jittered backoff and records each attempt in the runtime-proof
JSONL; a hung Codex CLI no longer blocks the task until manual recovery. - Mistargeted advisor decisions are rejected: any decision whose
taskIddiffers from the active task fails instead of being persisted
into the active task's decision path. - Telemetry rotation is concurrency-safe: mkdir-as-lock with PID-unique
temp files and stale-lock recovery; concurrent rotations no longer clobber
each other's snapshot. - Task contracts with typo'd layer names now get a denial naming the
valid layers from the domains config instead of a confusing scope message. - Advisor decision writes are proof-checked at write time: a decision
referencing a missing or invented runtime proof is denied by the
PreToolUse guard with contextual guidance instead of failing later at the
Stop gate. - SubagentStop structural checks are cached on a tree fingerprint, so
unchanged trees skip the workspace-wide rerun (failed runs are never
cached).
Changed
src/core/upgrade.mjssplit into phase modules undersrc/core/upgrade/
behind an unchanged re-export facade.
Full history: CHANGELOG.md
Install: npx agent-harness-kit@0.23.0 init