Skip to content

B02 SQLite append-only audit log + hash chain in verifier#37

Merged
pulkitpareek18 merged 1 commit into
mainfrom
dev
May 15, 2026
Merged

B02 SQLite append-only audit log + hash chain in verifier#37
pulkitpareek18 merged 1 commit into
mainfrom
dev

Conversation

@pulkitpareek18
Copy link
Copy Markdown
Collaborator

Task 3 of today. Implements the verifier-local audit log from the design doc §4.3.

Why a second audit log

We already have `audit_events` in the API's Postgres. The verifier-local SQLite log is defense in depth: if Postgres ever gets rewritten by an attacker with DB-level access, an auditor can reconcile against the verifier's tamper-evident replica. Hash-chained, append-only-at-SQL-trigger-level, written on a separate disk volume from Postgres.

Architecture

  • `verifier/src/audit-log.ts` — better-sqlite3, WAL mode, schema with 14 columns including `sequence` (monotonic), `prev_hash`, `entry_hash`. SQL triggers refuse UPDATE + DELETE.
  • Hash chain per design doc §5: `entry_hash = sha256(canonical(row excl entry_hash) || prev_hash)`. Canonical = JSON with sorted keys, no whitespace.
  • Genesis row at first init: `sequence=0, tenant_id='system', prev_hash=0×64`.
  • Proofs + public signals are hashed on storage, never persisted in full (security-policy §10).

Server wiring

  • `POST /verify` now appends a row per request; `verifierAuditId` in the response points at a durable record (was throwaway uuid before).
  • `GET /health` includes an `audit` block: `{ rowCount, nextSequence, lastEntryHashPrefix }`.
  • `GET /audit/stats` — same data, dedicated endpoint.
  • `GET /audit/verify-chain` — walks the full chain, reports `ok | firstBadSequence | firstBadReason`. O(N), for periodic ops.

Infrastructure

  • Dockerfile: alpine doesn't ship better-sqlite3 prebuilds for arm64-musl, so the verifier-production stage `apk add`s python3+make+g++ as a virtual `.build-deps` package, runs npm install (which triggers node-gyp build), then `apk del .build-deps` to keep the runtime image slim (~150MB saved).
  • `/app/data` created with non-root ownership at image build time so the user can write the SQLite file.
  • New `verifier-audit-data` named volume in compose, mounted at `/app/data` on the verifier container.

Local end-to-end smoke (against the rebuilt image)

```
=== /health ===
{"status":"ok","vkeyAvailable":true,"audit":{"rowCount":1,"nextSequence":1,...}}

=== 3 verifications ===
v=false audit=d29b2a57-9e2 lat=577ms # first: includes snarkjs init
v=false audit=cd0f4461-335 lat=25ms
v=false audit=0d906949-61f lat=22ms

=== /audit/stats ===
{"rowCount":4,"nextSequence":4,"lastEntryHashPrefix":"887ff050d115af7c"}

=== /audit/verify-chain ===
{"ok":true,"rowsChecked":4}

=== DELETE attempt ===
OK: verifier_events is append-only — DELETE refused
=== UPDATE attempt ===
OK: verifier_events is append-only — UPDATE refused
```

Tests

  • `verifier/tests/audit-log.test.ts` (16 new): genesis row shape, appendEvent + chain advancement, 5-event chain stays unbroken, tamper detection (drops the trigger, mutates a row, verifyChain catches it on entry_hash mismatch AND on prev_hash mismatch), SQL trigger refusal, getStats, hashPayload determinism.
  • `verifier/tests/server.test.ts` (23 existing) updated to `initAuditLog(':memory:')` in beforeAll.
  • 39 verifier tests pass (23 + 16). Backend 228 unchanged.

Out of scope (next)

  • Plumb `tenantId` + `environment` from API → verifier in the `verifyViaService` call (currently the verifier records `'unspecified'`). One-line change in `src/services/zkp.ts`.
  • Surface `verifierAuditId` into the API's `audit_events` row metadata so the two logs cross-reference.
  • Nightly backup of `verifier-audit-data` volume.

🤖 Generated with Claude Code

Task 3 of today. Implements the verifier-local audit log from
the design doc §4.3 — independent from the API's Postgres
audit_events table. If Postgres is ever compromised + rewritten
this gives an auditor a tamper-evident replica to reconcile against.

Architecture:

- verifier/src/audit-log.ts (~280 lines) — better-sqlite3 backed.
  WAL mode for concurrency. Schema: verifier_events table with 14
  columns including sequence (monotonic), prev_hash, entry_hash.
  Index on (tenant_id, environment, created_at DESC) + sequence.
- Append-only enforced two ways: (1) SQL triggers on UPDATE + DELETE
  that RAISE ABORT, (2) the hash chain — any row tamper changes its
  entry_hash, breaking the next row's prev_hash linkage.
- Hash chain construction matches the design doc §5:
    entry_hash = sha256(canonical(row excl entry_hash) || prev_hash)
  Canonical serialization = JSON with sorted keys, no whitespace.
- Genesis row written at first init: sequence=0, tenant_id='system',
  prev_hash = 0×64. Required so the chain has a known starting point
  for verify-chain to walk from.
- Proof + public signals are HASHED on storage — we never persist
  the raw proof bytes. SHA-256 is enough to prove "this exact proof
  was verified at this time" without ballooning the table.

Server wiring:

- initAuditLog called at startup with VERIFIER_AUDIT_DB_PATH
  (default /app/data/audit.db; in-memory in tests).
- POST /verify now appends a row per request and returns its UUID
  as the verifierAuditId in the response envelope. Replaces the
  previous "throwaway uuidv4 per request" — the id now points to a
  durable record.
- GET /health includes an `audit` block: { rowCount, nextSequence,
  lastEntryHashPrefix }. Surfaces chain state for ops.
- GET /audit/stats — same data, dedicated endpoint for the
  evidence-pack assembler.
- GET /audit/verify-chain — walks the full chain, recomputes each
  entry_hash, reports ok/firstBadSequence/firstBadReason. O(N);
  acceptable for periodic ops (daily cron) or pre-evidence-pack
  publish; not for every request.

Infrastructure:

- Dockerfile verifier-production stage: alpine doesn't have
  prebuilds for better-sqlite3 on arm64-musl, so we apk add
  python3+make+g++ as a --virtual .build-deps, install (which
  triggers node-gyp source build), then apk del .build-deps to
  remove ~150MB of build tooling from the runtime image. Removed
  --ignore-scripts on npm install so prebuild-install's postinstall
  runs (was blocking the native binding fetch).
- Dockerfile creates /app/data with zeroauth ownership so the
  non-root user can write the SQLite file at startup.
- VERIFIER_AUDIT_DB_PATH=/app/data/audit.db baked into the image.
- docker-compose: new `verifier-audit-data` named volume mounted at
  /app/data on the verifier container. Survives container restart
  + image rebuild. Production backup story is TODO (nightly snapshot
  of the docker volume).

Local end-to-end smoke (with the rebuilt image + volume):
  POST /verify → row appended, verifierAuditId returned
  /audit/stats → { rowCount: 4, nextSequence: 4, lastEntryHashPrefix }
  /audit/verify-chain → { ok: true, rowsChecked: 4 }
  Direct DELETE via docker exec → SQL trigger refuses ("append-only")
  Direct UPDATE via docker exec → SQL trigger refuses
  Latency: first call 577ms (snarkjs init + 1 SQLite write),
  subsequent calls 22-25ms (includes 1 SQLite append).

Tests added (16 in verifier/tests/audit-log.test.ts):

- Genesis row at sequence 0, prev_hash = 0×64
- nextSequence starts at 1 after genesis
- appendEvent returns a UUID v4 + inserts at sequence with
  prev_hash = lastEntryHash + persists proof/pub_signals hashes
- 5-event chain remains unbroken; verifyChain reports rowsChecked=6
- Tamper detection — direct DB write (after trigger drop) catches
  entry_hash mismatch + reports firstBadSequence + firstBadReason
- Tamper detection — prev_hash linkage mismatch caught too
- SQL triggers refuse UPDATE + DELETE
- getStats reflects current state
- hashPayload: 64 hex chars, deterministic, distinct for different
  inputs, sha256-of-string for string input

Existing server tests (23) updated to initAuditLog(':memory:') in
beforeAll. All 39 verifier tests pass. Backend 228 unchanged.

Out of scope (next):

- Pass tenant_id + environment from API → verifier in the
  verifyViaService call (currently the verifier records
  'unspecified'). Small change in src/services/zkp.ts.
- Surface verifierAuditId through to the API's audit_events row
  metadata so the two logs cross-reference cleanly.
- Nightly backup of the verifier-audit-data volume.
- Reproducible-build provenance (better-sqlite3 build is currently
  non-deterministic per ADR-0005 trade-off acceptance).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 15, 2026 07:49
@pulkitpareek18 pulkitpareek18 merged commit e065319 into main May 15, 2026
1 of 3 checks passed
@pulkitpareek18 pulkitpareek18 deleted the dev branch May 15, 2026 07:50
pulkitpareek18 added a commit that referenced this pull request May 15, 2026
Task 4 of today. Formally records the decision Pulkit made yesterday
when he picked Plan B over Plan A. Captures the three reasons
single-engineer velocity beat the brainstorm's Rust spec, what we
gave up (reproducible-build provenance, smaller transitive surface,
unsafe-discipline) and what we kept (cross-repo HTTP shape stays
Rust-compatible if we ever swap).

Also pins the inline-fallback retirement plan:
- 2026-05-15: verifier shipped, inline path unused but compiled-in
- 2026-05-16 → 2026-06-06: 3-week soak in prod
- 2026-06-08: PR to delete verifyInline + snarkjs from root deps +
  refuse-to-start when VERIFIER_URL is unset
- 2026-06-09: prod runs verifier-only

References the three shipping PRs (#35 cutover, #36 healthcheck hotfix,
#37 SQLite audit log) + the plan-mode design doc + the B02 build
prompt that we rejected.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
pulkitpareek18 added a commit that referenced this pull request May 15, 2026
Task 4 of today. Formally records the decision Pulkit made yesterday
when he picked Plan B over Plan A. Captures the three reasons
single-engineer velocity beat the brainstorm's Rust spec, what we
gave up (reproducible-build provenance, smaller transitive surface,
unsafe-discipline) and what we kept (cross-repo HTTP shape stays
Rust-compatible if we ever swap).

Also pins the inline-fallback retirement plan:
- 2026-05-15: verifier shipped, inline path unused but compiled-in
- 2026-05-16 → 2026-06-06: 3-week soak in prod
- 2026-06-08: PR to delete verifyInline + snarkjs from root deps +
  refuse-to-start when VERIFIER_URL is unset
- 2026-06-09: prod runs verifier-only

References the three shipping PRs (#35 cutover, #36 healthcheck hotfix,
#37 SQLite audit log) + the plan-mode design doc + the B02 build
prompt that we rejected.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@pulkitpareek18 pulkitpareek18 review requested due to automatic review settings May 15, 2026 08:16
pulkitpareek18 added a commit that referenced this pull request May 15, 2026
Task 3 of today. Implements the verifier-local audit log from
the design doc §4.3 — independent from the API's Postgres
audit_events table. If Postgres is ever compromised + rewritten
this gives an auditor a tamper-evident replica to reconcile against.

Architecture:

- verifier/src/audit-log.ts (~280 lines) — better-sqlite3 backed.
  WAL mode for concurrency. Schema: verifier_events table with 14
  columns including sequence (monotonic), prev_hash, entry_hash.
  Index on (tenant_id, environment, created_at DESC) + sequence.
- Append-only enforced two ways: (1) SQL triggers on UPDATE + DELETE
  that RAISE ABORT, (2) the hash chain — any row tamper changes its
  entry_hash, breaking the next row's prev_hash linkage.
- Hash chain construction matches the design doc §5:
    entry_hash = sha256(canonical(row excl entry_hash) || prev_hash)
  Canonical serialization = JSON with sorted keys, no whitespace.
- Genesis row written at first init: sequence=0, tenant_id='system',
  prev_hash = 0×64. Required so the chain has a known starting point
  for verify-chain to walk from.
- Proof + public signals are HASHED on storage — we never persist
  the raw proof bytes. SHA-256 is enough to prove "this exact proof
  was verified at this time" without ballooning the table.

Server wiring:

- initAuditLog called at startup with VERIFIER_AUDIT_DB_PATH
  (default /app/data/audit.db; in-memory in tests).
- POST /verify now appends a row per request and returns its UUID
  as the verifierAuditId in the response envelope. Replaces the
  previous "throwaway uuidv4 per request" — the id now points to a
  durable record.
- GET /health includes an `audit` block: { rowCount, nextSequence,
  lastEntryHashPrefix }. Surfaces chain state for ops.
- GET /audit/stats — same data, dedicated endpoint for the
  evidence-pack assembler.
- GET /audit/verify-chain — walks the full chain, recomputes each
  entry_hash, reports ok/firstBadSequence/firstBadReason. O(N);
  acceptable for periodic ops (daily cron) or pre-evidence-pack
  publish; not for every request.

Infrastructure:

- Dockerfile verifier-production stage: alpine doesn't have
  prebuilds for better-sqlite3 on arm64-musl, so we apk add
  python3+make+g++ as a --virtual .build-deps, install (which
  triggers node-gyp source build), then apk del .build-deps to
  remove ~150MB of build tooling from the runtime image. Removed
  --ignore-scripts on npm install so prebuild-install's postinstall
  runs (was blocking the native binding fetch).
- Dockerfile creates /app/data with zeroauth ownership so the
  non-root user can write the SQLite file at startup.
- VERIFIER_AUDIT_DB_PATH=/app/data/audit.db baked into the image.
- docker-compose: new `verifier-audit-data` named volume mounted at
  /app/data on the verifier container. Survives container restart
  + image rebuild. Production backup story is TODO (nightly snapshot
  of the docker volume).

Local end-to-end smoke (with the rebuilt image + volume):
  POST /verify → row appended, verifierAuditId returned
  /audit/stats → { rowCount: 4, nextSequence: 4, lastEntryHashPrefix }
  /audit/verify-chain → { ok: true, rowsChecked: 4 }
  Direct DELETE via docker exec → SQL trigger refuses ("append-only")
  Direct UPDATE via docker exec → SQL trigger refuses
  Latency: first call 577ms (snarkjs init + 1 SQLite write),
  subsequent calls 22-25ms (includes 1 SQLite append).

Tests added (16 in verifier/tests/audit-log.test.ts):

- Genesis row at sequence 0, prev_hash = 0×64
- nextSequence starts at 1 after genesis
- appendEvent returns a UUID v4 + inserts at sequence with
  prev_hash = lastEntryHash + persists proof/pub_signals hashes
- 5-event chain remains unbroken; verifyChain reports rowsChecked=6
- Tamper detection — direct DB write (after trigger drop) catches
  entry_hash mismatch + reports firstBadSequence + firstBadReason
- Tamper detection — prev_hash linkage mismatch caught too
- SQL triggers refuse UPDATE + DELETE
- getStats reflects current state
- hashPayload: 64 hex chars, deterministic, distinct for different
  inputs, sha256-of-string for string input

Existing server tests (23) updated to initAuditLog(':memory:') in
beforeAll. All 39 verifier tests pass. Backend 228 unchanged.

Out of scope (next):

- Pass tenant_id + environment from API → verifier in the
  verifyViaService call (currently the verifier records
  'unspecified'). Small change in src/services/zkp.ts.
- Surface verifierAuditId through to the API's audit_events row
  metadata so the two logs cross-reference cleanly.
- Nightly backup of the verifier-audit-data volume.
- Reproducible-build provenance (better-sqlite3 build is currently
  non-deterministic per ADR-0005 trade-off acceptance).
pulkitpareek18 added a commit that referenced this pull request May 15, 2026
Task 4 of today. Formally records the decision Pulkit made yesterday
when he picked Plan B over Plan A. Captures the three reasons
single-engineer velocity beat the brainstorm's Rust spec, what we
gave up (reproducible-build provenance, smaller transitive surface,
unsafe-discipline) and what we kept (cross-repo HTTP shape stays
Rust-compatible if we ever swap).

Also pins the inline-fallback retirement plan:
- 2026-05-15: verifier shipped, inline path unused but compiled-in
- 2026-05-16 → 2026-06-06: 3-week soak in prod
- 2026-06-08: PR to delete verifyInline + snarkjs from root deps +
  refuse-to-start when VERIFIER_URL is unset
- 2026-06-09: prod runs verifier-only

References the three shipping PRs (#35 cutover, #36 healthcheck hotfix,
#37 SQLite audit log) + the plan-mode design doc + the B02 build
prompt that we rejected.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant