Skip to content

feat(compliance): U11.U3 — live export runner Lambda body#950

Merged
ericodom merged 1 commit into
mainfrom
feat/compliance-u11-u3-runner
May 8, 2026
Merged

feat(compliance): U11.U3 — live export runner Lambda body#950
ericodom merged 1 commit into
mainfrom
feat/compliance-u11-u3-runner

Conversation

@ericodom
Copy link
Copy Markdown
Contributor

@ericodom ericodom commented May 8, 2026

Summary

U11.U3 of the master compliance arc. Swaps the U11.U2 stub runner Lambda for the live body. After this PR + dev deploy, the U11.U1 mutation runs end-to-end: queue message → runner streams matching audit_events rows to S3 as CSV/NDJSON → presigned URL appears on the job row → admin operators can download. U11.U4 wires the admin Exports UI on top.

Highlights

  • SQS handler with function_response_types=["ReportBatchItemFailures"] — single-record batches per the U11.U2 ESM config.
  • CAS guard on every invocation: UPDATE … WHERE status='queued'. Re-deliveries see 0 rows updated → log "skip-not-queued" → return success. Idempotent by design.
  • pg.Cursor stream with CURSOR_BATCH_SIZE=1000 keeps memory bounded for million-row exports.
  • Inline RFC 4180 CSV writer (30 LOC) — quotes when value contains ", ,, \r, or \n; doubles internal ". Columns: event_id, tenant_id, occurred_at, recorded_at, actor, actor_type, source, event_type, event_hash, prev_hash, payload_json.
  • NDJSON writer — one JSON object per line, \n separator.
  • S3 multipart upload via @aws-sdk/lib-storage's Upload class. Object key is {tenantId}/{jobId}.{ext} for tenant-scoped jobs, multi-tenant/{jobId}.{ext} when tenant_id == ALL_TENANTS_SENTINEL.
  • Presigned URL via @aws-sdk/s3-request-presigner with 15-min TTL. UI surfaces "URL expired" past presigned_url_expires_at and prompts re-export.
  • Bundled SDKlib-storage and s3-request-presigner aren't in the Lambda runtime, so the runner is added to the BUNDLED_AGENTCORE_ESBUILD_FLAGS allowlist. 114.7 KB zip.
  • Business failures don't throw — DB row records FAILED + handler returns SQS success. DLQ is reserved for handler crashes (malformed body, env vars empty, IAM regressions). The U11.U2 DLQ depth alarm catches both.
  • Module-load env snapshot per feedback_completion_callback_snapshot_pattern.

Test plan

  • pnpm -r --if-present typecheck clean
  • pnpm --filter @thinkwork/lambda test — 154 passed / 7 skipped (24 new compliance-export-runner unit tests covering CSV escape, row formatters, SQL filter builder, S3 key generator, SQS body parser)
  • bash scripts/build-lambdas.sh compliance-export-runner produces a 114.7 KB zip
  • bash -n scripts/post-deploy-smoke-compliance-export-runner.sh (shell syntax valid)
  • Post-deploy: compliance-export-runner-smoke GHA job exercises the parse + Aurora connect + CAS-guard paths in dev
  • Manual end-to-end: trigger U11.U1 mutation; confirm runner picks up message, writes S3 object, populates presigned_url + presigned_url_expires_at on the job row

Plan

docs/plans/2026-05-08-006-feat-compliance-u11-u3-runner-plan.md

Master plan progress

🤖 Generated with Claude Code

Replaces the U11.U2 stub with the live runner. SQS handler reads
{jobId}, performs a CAS guard QUEUED → RUNNING, opens a server-side
pg.Cursor against compliance.audit_events with the row's filter,
streams CSV/NDJSON to S3 multipart upload via @aws-sdk/lib-storage,
generates a 15-min presigned URL via @aws-sdk/s3-request-presigner,
and updates the job → COMPLETE with s3_key/presigned_url/expires_at,
or FAILED with job_error on any error.

Highlights:
- CSV writer is RFC 4180-compliant (30-line inline impl); NDJSON is
  one JSON object per line.
- pg.Cursor batch_size=1000 keeps memory bounded for million-row exports.
- Re-deliveries are no-ops (CAS guard misses → log + skip).
- Business failures don't throw — DB row records FAILED + handler
  returns SQS success. DLQ is reserved for handler crashes.
- Malformed SQS body throws → DLQ via maxReceiveCount=3.
- Module-load env snapshot per feedback_completion_callback_snapshot_pattern.
- Lib-storage + s3-request-presigner are bundled (BUNDLED_AGENTCORE_ESBUILD_FLAGS).

24 unit tests covering csv escape semantics, csv/ndjson row formatters,
SQL filter builder (tenant scope, ALL_TENANTS_SENTINEL, GraphQL→DB
event-type codec), S3 key generator, SQS body parser. All pass.

Post-deploy smoke at scripts/post-deploy-smoke-compliance-export-runner.sh
synthesizes a fake SQS event with a non-existent UUIDv7 jobId and
asserts {batchItemFailures: []}. Exercises the parse + Aurora connect
+ CAS-guard paths without depending on a queued job. New
compliance-export-runner-smoke job in deploy.yml runs after
terraform-apply succeeds.

Plan: docs/plans/2026-05-08-006-feat-compliance-u11-u3-runner-plan.md

Verified:
- pnpm -r --if-present typecheck (clean)
- pnpm --filter @thinkwork/lambda test (154 passed, 7 skipped)
- bash scripts/build-lambdas.sh compliance-export-runner (114.7 KB zip
  with bundled lib-storage + s3-request-presigner + pg + pg-cursor)
- bash -n scripts/post-deploy-smoke-compliance-export-runner.sh

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@ericodom ericodom merged commit 94369a4 into main May 8, 2026
5 checks passed
@ericodom ericodom deleted the feat/compliance-u11-u3-runner branch May 8, 2026 12:20
ericodom added a commit that referenced this pull request May 8, 2026
…ance arc) (#953)

Knowledge-track architecture-pattern doc capturing the meta-pattern
that shipped the master compliance arc (~17 PRs over 2 days,
2026-05-07–2026-05-08). Extends the existing
inert-to-live-seam-swap-pattern-2026-04-25.md (Python-module scoped)
with two dimensions surfaced during the compliance arc:

1. Substrate-first multi-layer ordering — DB schema → Terraform/IAM
   → Lambda shell → consumer code. The 2026-04-25 doc covered factory
   closures + seam_fn defaults at the Python-module scope; this doc
   generalizes to multi-layer infrastructure arcs spanning Aurora,
   S3 Object Lock buckets, SQS queues, and admin SPA.

2. Throw-don't-no-op rule for stubs — the inert state must be
   operator-visible (DLQ depth alarm, smoke-test failure). Silent
   no-op stubs that ack messages without doing work were rejected
   explicitly in the U11.U2 plan.

Three case studies with verbatim PR + file citations:
- U7→U8a→U8b — WORM anchor bucket + inert Lambda body + live S3 write
  (#917, #921, #927)
- U10 backend → extensions → admin UI (#937, #939, #941)
- U11 four-PR sequence: mutation → Terraform + stub → live runner
  → admin Exports page (#944, #948, #950, #951)

Includes:
- Stable-seam invariant (body swaps, contracts don't)
- Body-swap forcing functions in integration tests with call-count
  assertions (not just return-shape) to catch sibling-function escape
- CloudWatch alarm posture mirroring inert/live state
  (treat_missing_data flips on the live PR)
- Independent revertibility — substrate alone leaves a known-good
  inert state

Frontmatter validated parser-safe via the plugin-bundled
validate-frontmatter.py.

Also adds a one-line backlink in the prior-art doc so a reader landing
on the 2026-04-25 doc finds the multi-layer extension.

Generated via /ce-compound full mode (3 parallel research subagents +
ce-session-historian foreground).

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant