Release v0.23.0 · tuanle96/agent-harness-kit

[0.23.0] - 2026-06-10

Schema Compatibility

Two new schemas added (backward compatible, additive only):
golden-trace-case (.harness/schemas/golden-trace-case.schema.json) and
structural-test-result (.harness/schemas/structural-test-result.schema.json).
Both are registered in the stable schema policy (now 17 contracts). No
existing schema changed.

Added

migrate-settings command (alias migrate-config): repairs older
installs — moves legacy harness.config.json, restores the
Edit/Write/MultiEdit baseline permissions, merges unmerged hooks and
compiled permission hints into .claude/settings.json. Doctor now fails
with the remediation command when it detects this drift.
Golden-trace regression gate (check:golden-traces): 7 recorded cases
replay deterministic harness surfaces hermetically and exit 2 naming the
case and assertion when behavior regresses; wired into the generated CI
workflow.
Structural-test --json on all six language adapters (TypeScript,
Python, Go, Rust, Swift, Kotlin) emitting one structural-test-result
object for dashboards and PR annotations.
Loop detection (PostToolUse): repeated edits to one file inside a
window inject a doom-loop warning and a loop_detected telemetry row.
Symbol index (symbol-index.mjs): where-is-X navigation to file:line
across 8 languages with incremental PostToolUse updates.
Context budget (context-budget.mjs + contextBudget.totalChars):
explicit per-source allocation for SessionStart context with line-boundary
truncation and a usage report.
Observational memory (observation-compress.mjs): PreCompact and
SessionEnd compress transcripts into dense observations; SessionStart
re-injects the three most recent.
MCP skill server (mcp-skill-server.mjs): dependency-free MCP stdio
server exposing list_skills/load_skill to any MCP client.
Terrain-correct value-prop benches: eval tasks 05–07 (weak-model,
greenfield, adversarial) plus docs/benchmarks/value-prop-protocol.md
replacing the 1-shot A/B design that returned five consecutive nulls.
Doctor checks for eslint-plugin-boundaries presence/version and for
settings drift (baseline permissions, unmerged hooks).

Fixed

Codex advisor runner retries transient failures (timeout, non-zero
exit) with jittered backoff and records each attempt in the runtime-proof
JSONL; a hung Codex CLI no longer blocks the task until manual recovery.
Mistargeted advisor decisions are rejected: any decision whose
taskId differs from the active task fails instead of being persisted
into the active task's decision path.
Telemetry rotation is concurrency-safe: mkdir-as-lock with PID-unique
temp files and stale-lock recovery; concurrent rotations no longer clobber
each other's snapshot.
Task contracts with typo'd layer names now get a denial naming the
valid layers from the domains config instead of a confusing scope message.
Advisor decision writes are proof-checked at write time: a decision
referencing a missing or invented runtime proof is denied by the
PreToolUse guard with contextual guidance instead of failing later at the
Stop gate.
SubagentStop structural checks are cached on a tree fingerprint, so
unchanged trees skip the workspace-wide rerun (failed runs are never
cached).

Changed

src/core/upgrade.mjs split into phase modules under src/core/upgrade/
behind an unchanged re-export facade.

Full history: CHANGELOG.md
Install: npx agent-harness-kit@0.23.0 init

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.23.0

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

[0.23.0] - 2026-06-10

Schema Compatibility

Added

Fixed

Changed

Uh oh!