feat(pre-registration): expose hashJson + canonicalize for arbitrary content signing#32
Conversation
…content signing
The canonicalize+sha256 logic that signManifest is built on is general enough
that consumers signing arbitrary structured content (artifact bundles,
production packets, dataset versions, etc.) end up reimplementing it from
scratch. Two cases I checked while making this change:
- physim/apps/server/src/lib/manifest.ts re-implements canonicalize+sha256
for production-packet attestation, with an inline comment noting that
agent-eval's signer is shaped for HypothesisManifest only.
- phony/products/builder/api/src/eval/champion-sign.ts re-implements the
same canonicalize+sha256 (sync variant) before wrapping in Ed25519.
Both pieces of duplication go away if the generic primitive is exposed.
This change:
- Lifts the previously-private `canonicalize(v: unknown)` to an exported
function. Recursive key-sort, primitives pass through, arrays preserve
order. Behavior unchanged from the inlined version.
- Adds `hashJson<T>(obj: T): Promise<string>` — sha256 hex (full 64 chars)
over the canonicalized JSON encoding. The same primitive signManifest is
built on, refactored to call through.
- Naming: I went with `hashJson` rather than `hashContent` because
prompt-registry already exports `hashContent(s: string)` for the
truncated 12-char prompt-id helper. Different semantics (string in,
short id out) — the two coexist, named for what each actually does.
Backward compat: signManifest / verifyManifest output bit-for-bit identical
hashes (verified by a new test that compares hashJson(base) directly against
signManifest(base).contentHash). All 827 existing tests still pass.
Tests added:
- canonicalize sorts keys recursively + preserves array order + passes primitives
- hashJson is stable across key insertion order
- hashJson(base) === signManifest(base).contentHash (composition guarantee)
- hashJson and prompt-registry's hashContent are independent (different return shape)
✅ No Blockers —
|
| kimi-code | glm | aggregate | |
|---|---|---|---|
| Readiness | 95 | 96 | 95 |
| Confidence | 95 | 95 | 95 |
| Correctness | 95 | 98 | 95 |
| Security | 95 | 100 | 95 |
| Testing | 95 | 95 | 95 |
| Architecture | 95 | 95 | 95 |
I read every changed file, checked callers/callees, and ran tests. The PR is a clean, well-documented refactoring: it extracts
canonicalizeandhashJsonfromsignManifestinto exported utilities, updatessrc/index.tsexports, and adds 4 focused tests. All 14 tier2 tests pass (691 total tests pass); the 18 failed suites are pre-existing optional-dependency load errors (@ax-llm/ax,@hono/node-server, etc.) unrelated to this change. No functional changes, no API breaks, no defects found. | This PR extractscanonicalizeand a newhashJsonhelper from the private internals of `signM
🟡 LOW Duplicated canonicalize across three modules — src/pre-registration.ts
The same
canonicalizefunction now exists insrc/pre-registration.ts:84,src/dataset.ts:213, andsrc/feedback-trajectory.ts:606. This PR correctly exports the pre-registration version for public consumption, but the other two modules still use their own private copies. Not a defect in this PR (those files aren't in scope), but worth noting for a future dedup pass.
tangletools · 2026-05-07T17:36:48Z · trace
tangletools
left a comment
There was a problem hiding this comment.
✅ Approved — 1 non-blocking finding
I read every changed file, checked callers/callees, and ran tests. The PR is a clean, well-documented refactoring: it extracts
canonicalizeandhashJsonfromsignManifestinto exported utilities, updatessrc/index.tsexports, and adds 4 focused tests. All 14 tier2 tests pass (691 total tests pass); the 18 failed suites are pre-existing optional-dependency load errors (@ax-llm/ax,@hono/node-server, etc.) unrelated to this change. No functional changes, no API breaks, no defects found. | This PR extractscanonicalizeand a newhashJsonhelper from the private internals of `signM
🟡 LOW Duplicated canonicalize across three modules — src/pre-registration.ts
The same
canonicalizefunction now exists insrc/pre-registration.ts:84,src/dataset.ts:213, andsrc/feedback-trajectory.ts:606. This PR correctly exports the pre-registration version for public consumption, but the other two modules still use their own private copies. Not a defect in this PR (those files aren't in scope), but worth noting for a future dedup pass.
tangletools · 2026-05-07T17:36:48Z · trace
Motivation
signManifestandverifyManifestinpre-registration.tsare built on a canonicalize+sha256 primitive that is general — but the helper is private to that file, and the signing API is typed toHypothesisManifest. Consumers signing arbitrary structured content (artifact bundles, production packets, dataset versions, etc.) end up reimplementing canonicalize+sha256 from scratch.I confirmed this is happening in real consumers before opening the PR:
physim —
physim/apps/server/src/lib/manifest.tsre-implements canonicalize+sha256 for production-packet attestation, with an explicit inline comment: "Pattern is modeled on agent-eval's signManifest/verifyManifest (canonicalize → sha256-content → sidecar). Physim-native because agent-eval's signer is shaped for HypothesisManifest, not artifact-bundle attestation."phony —
phony/products/builder/api/src/eval/champion-sign.tsre-implements the same canonicalize+sha256 (sync variant) before wrapping in Ed25519 signature.Both pieces of duplication go away if the generic primitive is exposed.
What this changes
Three new public exports from
./pre-registration, additive only:canonicalize(v: unknown): unknownLifted from a previously-private helper. Recursive key-sort, primitives pass through, arrays preserve element order. Behavior unchanged from the inlined version — verified by signManifest still producing the same contentHash for any given input.
hashJson<T>(obj: T): Promise<string>SHA-256 hex (full 64 chars) over the canonicalized JSON encoding of
obj. Stable across:Refactored
signManifestto call through:```ts
export async function signManifest(m: HypothesisManifest): Promise {
const hash = await hashJson(m)
return { ...m, contentHash: hash, algo: 'sha256-content' }
}
```
Naming note
Initially I drafted this as
hashContent, butprompt-registry.tsalready exports a function with that name — and it has materially different semantics (string input, returns truncated 12-char id for prompt versioning). Collapsing them would break callers. They coexist:hashContentstringhashJson<T>T(any value)A regression test asserts the two produce different output for the same input (12 vs 64 chars) so future refactors don't accidentally re-collide them.
Backward compatibility
Purely additive on the public API.
signManifest/verifyManifestproduce bit-for-bit identical hashes — explicitly tested:```ts
it('hashJson matches signManifest contentHash for the same payload', async () => {
const signed = await signManifest(base)
const direct = await hashJson(base)
expect(signed.contentHash).toBe(direct)
})
```
All 827 existing tests still pass; 5 new tests added covering the new exports + the regression guard.
`pnpm typecheck` clean. `pnpm test` 94/94 files green.