v0.18.0
[0.18.0] - 2026-05-30
Schema Compatibility
- Added a machine-readable schema policy and checker that require durable
harness schemas to declare compatibility, migration, deprecation, and
changelog rules before1.0. - Added stable schemas for permission compile output, explain block records,
runtime parity reports, and state export manifests; runtime parity JSON now
carriesschemaVersion: 1. - Explain block records now accept
mode: "bypass"and an optional
fingerprintfield for bypass-audit diagnostics.
Documentation
- Added generated guides for policy pack authoring, runtime parity scorecards,
and team/PR adoption. - Documented the Codex native hook-fire probe and the
AHK_E2E_CODEX_REQUIRE_HOOKS=1release switch for making missing lifecycle
hook artifacts blocking.
Runtime Parity
- Added
codex-parity-probes.mjsandnpm run check:codex-parity-probes
as the release wrapper for strict Codex hook and reviewer artifact probes. - The strict Codex parity wrapper now emits per-probe JSON outcomes so release
logs identify whether hook-fire or reviewer-artifact parity failed, passed,
remained planned, or was not reached before the driver failed. - The kit repo release gate now validates root-local failure records in
addition to generated-template failure record fixtures, so observed runtime
gaps cannot sit outside readiness. - Runtime parity JSON now includes per-runtime status counts plus category,
promotion criteria, and next-step metadata for partial capability rows. - The real Codex E2E smoke now probes generated
SessionStartandSessionEnd
lifecycle hook artifacts. The probe is non-blocking by default while Codex
hook-fire parity remains partial, and becomes a hard failure when
AHK_E2E_CODEX_REQUIRE_HOOKS=1is set. The smoke now initializes a git
workspace and captures Codex feature/JSONL diagnostics when lifecycle
artifacts are missing. - The same E2E smoke now probes Codex reviewer decision artifact creation and
can make it blocking withAHK_E2E_CODEX_REQUIRE_REVIEWER_ARTIFACT=1. - Added
runtime-parity-report.mjs --fail-partialso release lanes can turn
partial runtime capabilities into blocking failures without changing the
default warning-only scorecard behavior.
Explain Layer
- Added
agent-harness-kit explainand generated.harness/scripts/explain.mjs
diagnostics for last-block, task, permission, evidence, and readiness modes. agent-harness-kit explain --bypass <fingerprint>now explains bypass audit
rows and approved request coverage from the same central diagnostics surface.agent-harness-kit explain --task <id>now points at
task-evidence-check --task=<id>instead of the Stop-hook-only
--active-taskmode.agent-harness-kit explain --permission <tool> --task <id>no longer reports
.harness/permissions.jsonas missing when the task contract supplies the
relevant allow/deny decision.agent-harness-kit explain --task <id>now reports linked evidence pass-proof
gaps such as missingdiffSummaryor UI artifacts, matching
explain --evidencediagnostics.agent-harness-kit explainnow reports repo-escaping evidence paths as unsafe
path fields instead of mislabeling them as ordinary missing files.- Task-scoped permission explanations now keep
sourceRuleon the task contract
even when.harness/permissions.jsonexists but was not part of the decision. agent-harness-kit explain --last-blocknow skips remediation telemetry such
asblock_remediatedso resolved blocks do not hide the last active block.agent-harness-kit explain --last-blocknow recognizes canonical block
telemetry events and routes task-evidence or permission-denied blocks to the
most direct repair command.- Generated installs now include
.harness/docs/explain.md, documenting JSON
output, repair commands, and override expectations.
Evidence Attestation
- Added
check-evidence-attestation.mjsas a strict readiness gate for passing
evidence bundles. It requires attested pass checks with command metadata,
stdout/stderr sidecar hashes, and a replay plan produced by
task-evidence-check --verify-hashes --replay-plan. - Generated installs now include
harness:evidence:attestation,
evidenceAttestationconfig, and theevidence-attestationreadiness gate. verify-uisummaries now include route, assertion, and DOM snapshot hash
metadata, and passing UI evidence must carry that browser proof metadata.- Generated installs now include
.harness/docs/evidence-attestation.md.
Permission Compiler
- Added task-aware permission compilation,
permissions diff, and
permissions explaindecision-chain output for task, skill, and default
policies. - Added
check-permissions-drift.mjsand wired generated readiness plus npm
scripts to use the dedicated drift wrapper. - High-risk task contracts now fail permission compilation when they use
wildcard or broad Bash permissions. - Permission compilation now emits runtime hook expectations for Claude and
Codex, and fails when generated hook matchers drift from the compiled runtime
contract, including Codexapply_patchmutation coverage. - Generated installs now write
.harness/permissions.compiled.jsonduring
rendering and merge compiler-derived Claude permission hints into
.claude/settings.json. harness-reportnow runs the permission compiler, surfaces compiled skill and
task contract counts, and fails release JSON output on compiler-detected
permission drift or high-risk task permission errors.
Bypass Governance
- Added the structured bypass request workflow:
bypass request,
bypass audit --strict, andbypass explainnow share the same strict audit
engine used by release readiness. - Strict bypass audit now accepts only approved, unexpired request scopes, rejects
scope mismatches, and requires failure-record links for
converted-to-failure-recordacknowledgements.
Harness Noise Reporting
- Added
report-harness-noise.mjsand generated npm scripts for ranking noisy
rules from block telemetry, bypass records, false-positive acknowledgements,
review latency, and Stop-hook loop-guard activations. - Statusline
last-blockalerts now filter telemetry for real block records, so
idle or permission prompt notifications do not masquerade as blocking gates. harness-reportnow embeds harness-noise status so release dashboards expose
false-positive and override pressure instead of burying it in logs.
Upgrade
- Upgrade now reconciles executable bits for unchanged managed scripts recorded
in the install lockfile, while preserving user-modified sidecar targets.
Eval Tasks
- The kit repo
check:eval-tasksrelease gate now uses a deterministic Node
wrapper instead of shell chaining, with aggregate JSON output covering both
root-local and generated-template eval task directories. - Added a package-script regression test that blocks unquoted shell chaining in
npm scripts, keeping multi-directory gates on deterministic Node wrappers.
Failure Learning
- The kit repo
check:failure-recordsrelease gate now uses a deterministic
Node wrapper instead of shell chaining, with aggregate JSON output covering
both generated-template and root-local failure record directories. check-failure-recordsnow reports the records directory it validated in
both text and JSON output, making multi-directory release gates easier to
audit.
Full history: CHANGELOG.md
Install: npx agent-harness-kit@0.18.0 init