Release v0.18.0 · tuanle96/agent-harness-kit

[0.18.0] - 2026-05-30

Schema Compatibility

Added a machine-readable schema policy and checker that require durable
harness schemas to declare compatibility, migration, deprecation, and
changelog rules before 1.0.
Added stable schemas for permission compile output, explain block records,
runtime parity reports, and state export manifests; runtime parity JSON now
carries schemaVersion: 1.
Explain block records now accept mode: "bypass" and an optional
fingerprint field for bypass-audit diagnostics.

Documentation

Added generated guides for policy pack authoring, runtime parity scorecards,
and team/PR adoption.
Documented the Codex native hook-fire probe and the
AHK_E2E_CODEX_REQUIRE_HOOKS=1 release switch for making missing lifecycle
hook artifacts blocking.

Runtime Parity

Added codex-parity-probes.mjs and npm run check:codex-parity-probes
as the release wrapper for strict Codex hook and reviewer artifact probes.
The strict Codex parity wrapper now emits per-probe JSON outcomes so release
logs identify whether hook-fire or reviewer-artifact parity failed, passed,
remained planned, or was not reached before the driver failed.
The kit repo release gate now validates root-local failure records in
addition to generated-template failure record fixtures, so observed runtime
gaps cannot sit outside readiness.
Runtime parity JSON now includes per-runtime status counts plus category,
promotion criteria, and next-step metadata for partial capability rows.
The real Codex E2E smoke now probes generated SessionStart and SessionEnd
lifecycle hook artifacts. The probe is non-blocking by default while Codex
hook-fire parity remains partial, and becomes a hard failure when
AHK_E2E_CODEX_REQUIRE_HOOKS=1 is set. The smoke now initializes a git
workspace and captures Codex feature/JSONL diagnostics when lifecycle
artifacts are missing.
The same E2E smoke now probes Codex reviewer decision artifact creation and
can make it blocking with AHK_E2E_CODEX_REQUIRE_REVIEWER_ARTIFACT=1.
Added runtime-parity-report.mjs --fail-partial so release lanes can turn
partial runtime capabilities into blocking failures without changing the
default warning-only scorecard behavior.

Explain Layer

Added agent-harness-kit explain and generated .harness/scripts/explain.mjs
diagnostics for last-block, task, permission, evidence, and readiness modes.
agent-harness-kit explain --bypass <fingerprint> now explains bypass audit
rows and approved request coverage from the same central diagnostics surface.
agent-harness-kit explain --task <id> now points at
task-evidence-check --task=<id> instead of the Stop-hook-only
--active-task mode.
agent-harness-kit explain --permission <tool> --task <id> no longer reports
.harness/permissions.json as missing when the task contract supplies the
relevant allow/deny decision.
agent-harness-kit explain --task <id> now reports linked evidence pass-proof
gaps such as missing diffSummary or UI artifacts, matching
explain --evidence diagnostics.
agent-harness-kit explain now reports repo-escaping evidence paths as unsafe
path fields instead of mislabeling them as ordinary missing files.
Task-scoped permission explanations now keep sourceRule on the task contract
even when .harness/permissions.json exists but was not part of the decision.
agent-harness-kit explain --last-block now skips remediation telemetry such
as block_remediated so resolved blocks do not hide the last active block.
agent-harness-kit explain --last-block now recognizes canonical block
telemetry events and routes task-evidence or permission-denied blocks to the
most direct repair command.
Generated installs now include .harness/docs/explain.md, documenting JSON
output, repair commands, and override expectations.

Evidence Attestation

Added check-evidence-attestation.mjs as a strict readiness gate for passing
evidence bundles. It requires attested pass checks with command metadata,
stdout/stderr sidecar hashes, and a replay plan produced by
task-evidence-check --verify-hashes --replay-plan.
Generated installs now include harness:evidence:attestation,
evidenceAttestation config, and the evidence-attestation readiness gate.
verify-ui summaries now include route, assertion, and DOM snapshot hash
metadata, and passing UI evidence must carry that browser proof metadata.
Generated installs now include .harness/docs/evidence-attestation.md.

Permission Compiler

Added task-aware permission compilation, permissions diff, and
permissions explain decision-chain output for task, skill, and default
policies.
Added check-permissions-drift.mjs and wired generated readiness plus npm
scripts to use the dedicated drift wrapper.
High-risk task contracts now fail permission compilation when they use
wildcard or broad Bash permissions.
Permission compilation now emits runtime hook expectations for Claude and
Codex, and fails when generated hook matchers drift from the compiled runtime
contract, including Codex apply_patch mutation coverage.
Generated installs now write .harness/permissions.compiled.json during
rendering and merge compiler-derived Claude permission hints into
.claude/settings.json.
harness-report now runs the permission compiler, surfaces compiled skill and
task contract counts, and fails release JSON output on compiler-detected
permission drift or high-risk task permission errors.

Bypass Governance

Added the structured bypass request workflow: bypass request,
bypass audit --strict, and bypass explain now share the same strict audit
engine used by release readiness.
Strict bypass audit now accepts only approved, unexpired request scopes, rejects
scope mismatches, and requires failure-record links for
converted-to-failure-record acknowledgements.

Harness Noise Reporting

Added report-harness-noise.mjs and generated npm scripts for ranking noisy
rules from block telemetry, bypass records, false-positive acknowledgements,
review latency, and Stop-hook loop-guard activations.
Statusline last-block alerts now filter telemetry for real block records, so
idle or permission prompt notifications do not masquerade as blocking gates.
harness-report now embeds harness-noise status so release dashboards expose
false-positive and override pressure instead of burying it in logs.

Upgrade

Upgrade now reconciles executable bits for unchanged managed scripts recorded
in the install lockfile, while preserving user-modified sidecar targets.

Eval Tasks

The kit repo check:eval-tasks release gate now uses a deterministic Node
wrapper instead of shell chaining, with aggregate JSON output covering both
root-local and generated-template eval task directories.
Added a package-script regression test that blocks unquoted shell chaining in
npm scripts, keeping multi-directory gates on deterministic Node wrappers.

Failure Learning

The kit repo check:failure-records release gate now uses a deterministic
Node wrapper instead of shell chaining, with aggregate JSON output covering
both generated-template and root-local failure record directories.
check-failure-records now reports the records directory it validated in
both text and JSON output, making multi-directory release gates easier to
audit.

Full history: CHANGELOG.md
Install: npx agent-harness-kit@0.18.0 init

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.18.0

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

[0.18.0] - 2026-05-30

Schema Compatibility

Documentation

Runtime Parity

Explain Layer

Evidence Attestation

Permission Compiler

Bypass Governance

Harness Noise Reporting

Upgrade

Eval Tasks

Failure Learning

Uh oh!