Skip to content

feat(pre-registration): expose hashJson + canonicalize for arbitrary content signing#32

Merged
drewstone merged 1 commit into
mainfrom
feat/hash-content-canonicalize
May 8, 2026
Merged

feat(pre-registration): expose hashJson + canonicalize for arbitrary content signing#32
drewstone merged 1 commit into
mainfrom
feat/hash-content-canonicalize

Conversation

@drewstone
Copy link
Copy Markdown
Contributor

Motivation

signManifest and verifyManifest in pre-registration.ts are built on a canonicalize+sha256 primitive that is general — but the helper is private to that file, and the signing API is typed to HypothesisManifest. Consumers signing arbitrary structured content (artifact bundles, production packets, dataset versions, etc.) end up reimplementing canonicalize+sha256 from scratch.

I confirmed this is happening in real consumers before opening the PR:

  • physimphysim/apps/server/src/lib/manifest.ts re-implements canonicalize+sha256 for production-packet attestation, with an explicit inline comment: "Pattern is modeled on agent-eval's signManifest/verifyManifest (canonicalize → sha256-content → sidecar). Physim-native because agent-eval's signer is shaped for HypothesisManifest, not artifact-bundle attestation."

  • phonyphony/products/builder/api/src/eval/champion-sign.ts re-implements the same canonicalize+sha256 (sync variant) before wrapping in Ed25519 signature.

Both pieces of duplication go away if the generic primitive is exposed.

What this changes

Three new public exports from ./pre-registration, additive only:

canonicalize(v: unknown): unknown

Lifted from a previously-private helper. Recursive key-sort, primitives pass through, arrays preserve element order. Behavior unchanged from the inlined version — verified by signManifest still producing the same contentHash for any given input.

hashJson<T>(obj: T): Promise<string>

SHA-256 hex (full 64 chars) over the canonicalized JSON encoding of obj. Stable across:

  • object key insertion order (canonicalize sorts recursively)
  • encoder choice (UTF-8 via TextEncoder, fixed)
  • runtime (Web Crypto subtle digest, Node ≥18 + browsers)

Refactored signManifest to call through:

```ts
export async function signManifest(m: HypothesisManifest): Promise {
const hash = await hashJson(m)
return { ...m, contentHash: hash, algo: 'sha256-content' }
}
```

Naming note

Initially I drafted this as hashContent, but prompt-registry.ts already exports a function with that name — and it has materially different semantics (string input, returns truncated 12-char id for prompt versioning). Collapsing them would break callers. They coexist:

Function Source Input Output
hashContent prompt-registry string first 12 hex chars
hashJson<T> pre-registration T (any value) full 64 hex chars over canonicalized JSON

A regression test asserts the two produce different output for the same input (12 vs 64 chars) so future refactors don't accidentally re-collide them.

Backward compatibility

Purely additive on the public API. signManifest / verifyManifest produce bit-for-bit identical hashes — explicitly tested:

```ts
it('hashJson matches signManifest contentHash for the same payload', async () => {
const signed = await signManifest(base)
const direct = await hashJson(base)
expect(signed.contentHash).toBe(direct)
})
```

All 827 existing tests still pass; 5 new tests added covering the new exports + the regression guard.

`pnpm typecheck` clean. `pnpm test` 94/94 files green.

…content signing

The canonicalize+sha256 logic that signManifest is built on is general enough
that consumers signing arbitrary structured content (artifact bundles,
production packets, dataset versions, etc.) end up reimplementing it from
scratch. Two cases I checked while making this change:

  - physim/apps/server/src/lib/manifest.ts re-implements canonicalize+sha256
    for production-packet attestation, with an inline comment noting that
    agent-eval's signer is shaped for HypothesisManifest only.
  - phony/products/builder/api/src/eval/champion-sign.ts re-implements the
    same canonicalize+sha256 (sync variant) before wrapping in Ed25519.

Both pieces of duplication go away if the generic primitive is exposed.

This change:

  - Lifts the previously-private `canonicalize(v: unknown)` to an exported
    function. Recursive key-sort, primitives pass through, arrays preserve
    order. Behavior unchanged from the inlined version.

  - Adds `hashJson<T>(obj: T): Promise<string>` — sha256 hex (full 64 chars)
    over the canonicalized JSON encoding. The same primitive signManifest is
    built on, refactored to call through.

  - Naming: I went with `hashJson` rather than `hashContent` because
    prompt-registry already exports `hashContent(s: string)` for the
    truncated 12-char prompt-id helper. Different semantics (string in,
    short id out) — the two coexist, named for what each actually does.

Backward compat: signManifest / verifyManifest output bit-for-bit identical
hashes (verified by a new test that compares hashJson(base) directly against
signManifest(base).contentHash). All 827 existing tests still pass.

Tests added:
  - canonicalize sorts keys recursively + preserves array order + passes primitives
  - hashJson is stable across key insertion order
  - hashJson(base) === signManifest(base).contentHash (composition guarantee)
  - hashJson and prompt-registry's hashContent are independent (different return shape)
@tangletools
Copy link
Copy Markdown
Contributor

tangletools commented May 7, 2026

✅ No Blockers — c5fc594b

Readiness 95/100 · Confidence 95/100 · 1 finding (1 low)

kimi-code glm aggregate
Readiness 95 96 95
Confidence 95 95 95
Correctness 95 98 95
Security 95 100 95
Testing 95 95 95
Architecture 95 95 95

I read every changed file, checked callers/callees, and ran tests. The PR is a clean, well-documented refactoring: it extracts canonicalize and hashJson from signManifest into exported utilities, updates src/index.ts exports, and adds 4 focused tests. All 14 tier2 tests pass (691 total tests pass); the 18 failed suites are pre-existing optional-dependency load errors (@ax-llm/ax, @hono/node-server, etc.) unrelated to this change. No functional changes, no API breaks, no defects found. | This PR extracts canonicalize and a new hashJson helper from the private internals of `signM

🟡 LOW Duplicated canonicalize across three modules — src/pre-registration.ts

The same canonicalize function now exists in src/pre-registration.ts:84, src/dataset.ts:213, and src/feedback-trajectory.ts:606. This PR correctly exports the pre-registration version for public consumption, but the other two modules still use their own private copies. Not a defect in this PR (those files aren't in scope), but worth noting for a future dedup pass.


tangletools · 2026-05-07T17:36:48Z · trace

Copy link
Copy Markdown
Contributor

@tangletools tangletools left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Approved — 1 non-blocking finding

I read every changed file, checked callers/callees, and ran tests. The PR is a clean, well-documented refactoring: it extracts canonicalize and hashJson from signManifest into exported utilities, updates src/index.ts exports, and adds 4 focused tests. All 14 tier2 tests pass (691 total tests pass); the 18 failed suites are pre-existing optional-dependency load errors (@ax-llm/ax, @hono/node-server, etc.) unrelated to this change. No functional changes, no API breaks, no defects found. | This PR extracts canonicalize and a new hashJson helper from the private internals of `signM

🟡 LOW Duplicated canonicalize across three modules — src/pre-registration.ts

The same canonicalize function now exists in src/pre-registration.ts:84, src/dataset.ts:213, and src/feedback-trajectory.ts:606. This PR correctly exports the pre-registration version for public consumption, but the other two modules still use their own private copies. Not a defect in this PR (those files aren't in scope), but worth noting for a future dedup pass.


tangletools · 2026-05-07T17:36:48Z · trace

@drewstone drewstone merged commit b66856d into main May 8, 2026
@drewstone drewstone deleted the feat/hash-content-canonicalize branch May 8, 2026 14:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants