fix: skill_runner reads /tmp/skills (matches where install_skills writes)#4
Merged
Merged
Conversation
install_skills.install_skill_from_s3() writes downloaded skills to
/tmp/skills/{skill_id}/, but skill_runner.SKILLS_DIR was set to
/app/skills — a directory the Dockerfile creates but never populates.
Every per-request skill registration therefore silently skipped every
skill (parse yields None because skill.yaml doesn't exist under /app).
Observed: chat-agent-invoke logs showed 'Injected built-in tool
web-search (provider=exa)' and the AgentCore container received the
payload, but 'Grouped skill registration: 0 tool-mode tools, 0
agent-mode skills, 0 total' — so Claude never saw web_search, never
called it, and replied "I don't have a web search tool available."
This fix aligns the read path with the write path. No tenant data
lives under /app/ anyway — Lambda /tmp is the only writable location
at runtime, and that's already what install_skills uses.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2 tasks
ericodom
added a commit
that referenced
this pull request
Apr 20, 2026
…ck (#296) Follow-up to #294 / #295. Marco bootstrap chain worked for the first hop but silently stopped at step 2: job #2 enqueued #3 successfully, but #3 (which hit max_new_pages again) didn't enqueue #4. Root cause: the continuation bucket was computed as `Date.now() + 300s`. When a chained job's own runtime exceeded its bucket length (job ran 112s but bucket is 300s; once runtime plus the offset crosses a bucket boundary, the computed "next" bucket equals the bucket the job itself is running in). The dedupe key collided with the child's own row, `ON CONFLICT DO NOTHING` swallowed the insert, and the chain died without a visible error. Anchor the offset on `job.created_at` instead. Each step in a chain now produces a strictly-monotonic bucket: parent in bucket N ⇒ child enqueued for bucket N+1, child in bucket N+1 ⇒ grandchild for N+2 — regardless of how long any step took to run. Dedupe collisions can only fire against external jobs (e.g., a memory-retain trigger that hit the same bucket), which is the correct behavior. Observed on dev: Marco job chain went 1→2, stopped. Post-fix, re-triggering will exercise the continuation through the full 261 memories until the cursor drains. No test regression (411/419, pre-existing 8 skips). Continuation behavior itself is covered by the Marco integration that spawned this fix.
6 tasks
ericodom
added a commit
that referenced
this pull request
Apr 20, 2026
…tion fix (#318) Closes handoff plan items #3 and #4.1 from plans/2026-04-20-006-handoff-cluster-enrichment-and-followups.md. ## #4.1 extractCityFromAddress dotted-abbreviation fix Spanish/Canadian/Australian addresses like "..., San Miguel de Allende, Gto., Mexico" previously produced candidate "Gto." (Guanajuato) because the region-code walk only recognized `^[A-Z]{2,4}(\s|$)` patterns. The audit showed 32 Marco records producing "Gto." and 22 producing "Q.R." (Quintana Roo) as candidate cities — both now correctly resolve to the preceding city slot. Fix: `isDottedRegionAbbr` recognizes `[A-Z][a-z]{0,2}\.` groups (up to 4 repetitions, ≤ 10 chars) and terminates the walk like the existing US-style region codes do. Record-expander probe on Marco confirms: candidate "San Miguel De Allende" (support=32) replaces "Gto." and the equivalent cities replace "Q.R." ## #3 summary-expander → deterministic linker wiring The audit showed `deriveParentCandidatesFromPageSummaries` produces 91 candidates on Marco (Toronto, Seattle, Honolulu hubs + many more) but the deterministic linker only consumed `deriveParentCandidates(records)` — so those candidates never became links. ### Type extension `DerivedParentCandidate` gets `sourceKind?: "record" | "summary"`: - record: `sourceRecordIds` are memory-record ids (existing path) - summary: `sourceRecordIds` are page ids (new path) Field is optional so existing test fixtures stay green; emitter defaults to "record" when omitted. ### Emitter update `emitDeterministicParentLinks` now builds two leaf indexes: - `leavesByRecord` — keyed on memory-record id (existing) - `leavesById` — keyed on page id, built from `[...scopePages, ...affectedPages]` (new) Summary-kind candidates route through `leavesById`. A new optional `scopePages` arg feeds the index so summary-kind candidates can resolve leaves that weren't touched THIS batch — necessary because summary-based candidates come from a scope-wide scan, not batch-local records. ### Compiler wiring `applyPlan` now calls BOTH expanders and passes merged candidates to the linker, along with `candidatePages` (already fetched for the planner call) as `scopePages`. Bypasses the merge-across-kinds ambiguity by concatenating both lists rather than merging — the emitter handles each candidate according to its kind. ### Precision filter tightening Summary-expander output adds `isLikelyCityToken` filter before pushing into the byCity map: - drops > 4-word fragments ("Prospect Interested In The Full PVL Product Line") - drops < 3-char tokens ("St") - drops street-suffix endings ("Congress Ave", "Queen St") These are the noise categories the 04-20 audit surfaced; real cities like "Buenos Aires" (2 words), "New York" (2 words), and "Montréal" (1 word with accent) all pass through. ## Expected impact On the next Marco recompile: `links_written_deterministic` should increase from the current 14/batch (record-only path) by 30-60 links (Toronto entity leaves → Toronto hub-if-exists-else-fuzzy-match, Seattle / Honolulu / etc.) — validated by unit tests; live-compile numbers land after deploy. Backfill dry-run on Marco: 22 parent links (up from 21), all spot- check precision-correct; wet run: 386 → 387 reference links (+1 net new, most attempts are idempotent re-writes of existing edges). ## Test coverage - 3 new dotted-abbreviation tests (Gto., Q.R., B.C.) - 5 new summary-expander filter tests (word cap, short, street suffix, sourceKind tagging for both expanders) - 5 new deterministic-linker tests for the summary-kind branch (scopePages leaf resolution, empty-pool skip, non-entity filter, back-compat when sourceKind omitted, both-kinds-fire for same parent) - Total: 471 passed / 8 skipped (up from 458). - Typecheck clean. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
4 tasks
ericodom
added a commit
that referenced
this pull request
Apr 20, 2026
…ks dedup (#319) Captures the 2026-04-20 third-session end state. Supersedes plans/2026-04-20-006-handoff-cluster-enrichment-and-followups.md now that PRs #318 closed out items #3 (summary-expander wiring) and #4.1 (dotted-abbreviation fix). Four remaining items in priority order: 1. Validate #318 on Marco (TOP PRIORITY). PR promised links_written_deterministic jump of +30-60 on the next live compile, but the backfill can't exercise that path. Trigger a manual compile and verify the prediction before building more on top. 2. Unit 6 mention cluster enrichment. Biggest remaining plan item. Brief on the key design decision (mentionClusterEnrichments[] JSON shape — Option A separate rows recommended over Option B inline-promotions). 3. wikiBacklinks dedup. Warm-up PR (<1 hour). listBacklinks misses the dedup pattern that listConnectedPages already has. 4. Trivial grab-bag: wipeWikiScope FK dependency, applier-split debt (good to bundle with #2 since cluster promotion adds more lines to the already-1300-line applyAggregationPlan). Session tally: 6 PRs merged (#309, #311, #312, #316, #317, #318). Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced Apr 21, 2026
ericodom
added a commit
that referenced
this pull request
Apr 22, 2026
Three drift incidents in five days (Apr 17 mig 0008, Apr 18 collision, Apr 21 0018+0019) had the same root cause and the same named-but- unshipped fix. Captures the pattern so the next author sees it before shipping migration #4. The drift-reporter fix shipped in PR #367; this is the learning that explains why the reporter exists and what marker convention every future unindexed migration must follow. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2 tasks
ericodom
added a commit
that referenced
this pull request
Apr 22, 2026
Resolves the P1 tension the handoff flagged: the plan honestly scoped R13 to 'no token via Python-stdio-mediated writes or known-shape CloudWatch patterns' and added T1b (intra-tenant template-author exfil), but the brainstorm still read absolutely. Three edits align the goalposts: 1. R13 rescoped to match the plan, with named residual coverage gaps (os.write at fd level, subprocess env dumps, C-extension writes, multiprocessing workers, adversarial split-writes) tracked as the Stdout-bypass class alongside T1. 2. Success Criterion #4 softened to match — 'within R13's scope.' 3. T1b added as a first-class residual threat between T1 and T2, with v1 mitigations (1-hour TTL, shared-template-author review as compensating control, tenant-as-trust-boundary) and v2 hardening track (per-user ABAC session tags or in-process credential proxy, latter preferred because it addresses T1 and T1b simultaneously).
6 tasks
ericodom
added a commit
that referenced
this pull request
Apr 22, 2026
…ation (Unit 6) (#430) ## createTenant wiring After INSERTing the tenant row, createTenant now invokes the agentcore-admin Lambda's /provision-tenant-sandbox route (plan Unit 5) via a new invokeProvisionTenantSandbox helper. The invoke uses InvocationType: RequestResponse per feedback_avoid_fire_and_forget_lambda_invokes so errors surface inside createTenant — but createTenant catches and logs them so a sandbox outage doesn't turn into a tenant-onboarding outage. The reconciler (Unit 6 follow-up) sweeps rows with null sandbox_interpreter_*_id at its own cadence. invokeProvisionTenantSandbox: - Reads AGENTCORE_ADMIN_LAMBDA_ARN + AGENTCORE_ADMIN_TOKEN env vars; throws SandboxProvisioningConfigError if missing (distinguishable so missing config is a warn, not an error). - Builds the API Gateway v2 envelope the handler is written for. - 45s abort signal matches the handler's own budget. - Translates statusCode 4xx → Error with server-side message; 2xx → structured ProvisionResult. ## updateTenantPolicy (new) Platform-operator-only mutation for sandbox_enabled + compliance_tier policy changes. Separate from updateTenant because the changes are security-boundary shifts and are audited in tenant_policy_events. Gate: caller's email must appear in the THINKWORK_PLATFORM_OPERATOR_EMAILS allowlist (comma-separated env var). This is the swap-out point for formal RBAC when it lands. Transition semantics encoded in a pure computeTransition helper (11 tests): - No-op when nothing changes. - sandbox_enabled true rejected when compliance_tier != standard. - compliance_tier → non-standard coerces sandbox_enabled off, producing a paired audit event so the transition is reproducible from the audit log alone. - tier-first ordering means 'enable sandbox AND set tier to hipaa in one call' deterministically rejects. Writes are wrapped in a db.transaction so the tenants UPDATE and the tenant_policy_events INSERT land atomically. ## GraphQL - Tenant type gains sandboxEnabled / complianceTier / sandboxInterpreter*Id - New UpdateTenantPolicyInput + updateTenantPolicy mutation - pnpm schema:build re-ran against the AppSync subscription schema ## Tests — 16 passing - sandbox-provisioning.test.ts (5) — envelope shape, 200/202/4xx parsing, Lambda FunctionError surfacing - updateTenantPolicy.test.ts (11) — every permutation of the transition decision tree: no-op, toggle true on standard, reject toggle true on regulated/hipaa, toggle false always ok, tier change coerces sandbox, composite tier+sandbox requests ## Deferred to follow-up - Reconciler Lambda (EventBridge scheduled fill + drift passes) — lands when the agentcore-admin Lambda terraform resource lands. Currently handled reactively: sandbox failures on createTenant are logged and the next successful createTenant-invoke retry picks up partial state. - SNS platform-security topic — per handoff P1 #4, dropped from v1 because no named subscriber / SLA exists. Audit row in tenant_policy_events provides detection; notification is v2. - Pre-existing tenants: a one-time operator action flips sandbox_enabled per-tenant during staged rollout (see plan Operational Notes).
This was referenced Apr 24, 2026
ericodom
added a commit
that referenced
this pull request
Apr 24, 2026
…hon skill (#486) Parity pass with packages/skill-catalog/thinkwork-admin/scripts/operations/*.py. The @thinkwork/admin-ops package now exposes every op the Python skill ships, and the admin-ops MCP server registers all of them as MCP tools. Sets up deprecation of the skill: agents using mcp.thinkwork.ai can do everything the skill's Python wrappers did. Client - AdminOpsClient gains a `graphql(query, variables?)` helper that POSTs to /graphql with the same Bearer, throws AdminOpsError on error responses. Mutations + most reads use GraphQL (matches the Python skill's wire); tenants module continues to use REST since those handlers already exist. Ported modules (28 ops): - teams.ts 5 mutations + 2 reads (createTeam, add/remove team agents + users, listTeams, getTeam) - agents.ts 3 mutations + 3 reads (createAgent, setAgentSkills, setAgentCapabilities, listAgents, getAgent, listAllTenantAgents) - templates.ts 5 mutations + 3 reads (createAgentTemplate, createAgentFromTemplate, syncTemplateToAgent, syncTemplateToAllAgents, acceptTemplateUpdate, listTemplates, getTemplate, listLinkedAgentsForTemplate) - users.ts 0 mutations + 3 reads (me, getUser, listTenantMembers) - artifacts.ts 0 mutations + 2 reads (listArtifacts, getArtifact) - _fields.ts shared GraphQL field-selection constants mirroring reads.py MCP tool registration (packages/lambda/admin-ops-mcp.ts) - 25 new tools covering every ported op. Each carries a JSON Schema inputSchema + a non-empty description. Tenant pinning from the authenticated key overrides any caller-supplied tenantId on downstream calls. - The existing tools/list test asserts a curated must-have set rather than an exact-count equality, so future ports don't require test churn. Tests - 6 new tests in packages/admin-ops/src/teams.test.ts covering wire-format correctness (queries contain the right operation names, variables carry through, errors surface as AdminOpsError). - Full monorepo test run: 1270+ tests passing. Not in scope - CLI migration of `thinkwork team/agent/template/user/artifact` subcommands — deferred to a follow-up PR. - Removal of packages/skill-catalog/thinkwork-admin/ — deferred to PR #5 after seed (PR #4) promotes mcp.thinkwork.ai to tenants. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ericodom
added a commit
that referenced
this pull request
Apr 24, 2026
…ction preflight (#485) * refactor(sandbox): drop required_connections / OAuth preamble / connection preflight Ends the sandbox's OAuth-into-os.environ path end-to-end. Admin UI already stopped surfacing required_connections in #477; this closes the loop across validator, preflight, dispatcher, container, preamble, the pilot skill, concept doc, and runbook. Why now: the OAuth preamble was a live realization of the T1/T1b/T2 residual-threat classes the concept doc itself warns about. The v2 in-process credential proxy is the planned structural fix; landing that work cleanly requires this path gone. Agents that need OAuth-ed work (Slack, GitHub) call composable-skill connector scripts instead. What survives in the sandbox: - execute_code is a pure-compute primitive - The preamble still runs as executeCode call #1 — now a one-line sitecustomize readiness check that aborts the session if the stdio redactor didn't install (refuses to run user code on an unmitigated image). PREAMBLE_VERSION bumps to 2 - Preflight decision tree shrinks from 5 outcomes to 4; the missing-connection + ConnectionRevoked error classes are gone - Dispatcher payload drops sandbox_secret_paths / sandbox_tenant_id / sandbox_user_id / sandbox_stage; only sandbox_interpreter_id + sandbox_environment survive - packages/api/src/lib/sandbox-secrets.ts deleted outright - Validator rejects required_connections on write (not silently accepts) so operators can't reintroduce it via raw GraphQL - Hydration silently strips the key from legacy DB rows — no migration Docs: - Concept page loses the Preamble section, SandboxMissingConnection and ConnectionRevoked rows, required_connections YAML, and the T1 residual row. T1b dropped since the exfil class no longer exists. The 11-step per-turn lifecycle shrinks to 9 steps - Runbook loses failure modes #3 and #4; architecture-in-one-page simplified accordingly - sandbox-pilot SKILL.md rewritten to pure compute (S3 upload via per-tenant IAM role, no Slack post, no GitHub token) Verification: typecheck clean, 1050 api tests pass, docs site builds 79 pages clean, preamble emission verified via ast.parse + inline regression assertions (no boto3, no SecretString, no os.environ[...], no OAuth env-var names, no token prefixes). Plan: docs/plans/2026-04-23-006-refactor-sandbox-drop-required-connections-plan.md Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(sandbox): ce-review autofix sweep — tool docstring, e2e fixture, warm-container cleanup Applies ten safe_auto fixes surfaced by the ce:review pass across 9 reviewer personas. Highest-impact fixes: - sandbox_tool.py: rewrote the execute_code Strands tool docstring to describe the pure-compute primitive. Removed the retired OAuth env var claims (GITHUB_ACCESS_TOKEN, SLACK_ACCESS_TOKEN, GCAL_ACCESS_TOKEN) and ConnectionRevoked error — 7 reviewers flagged this; the LLM reads this docstring as live tool guidance - fixtures.ts: dropped required_connections from the e2e fixture's createAgentTemplate call (the validator now rejects it, every integration test would abort in setup). Removed vacuous syntheticTokens / seedConnections / putSecret plumbing that the token-leak assertion's forbiddenValues depended on - sandbox-pilot.e2e.test.ts: kept the structural CloudWatch token-leak check but dropped the synthetic forbiddenValues — no synthetic tokens are injected, so the structural prefix check (ghp_/xoxb-/ ya29./JWT) is the only meaningful guard - invocation_env.py: unconditional pop of all six SANDBOX_* keys at invocation entry. Closes the warm-container carryover window where a pre-deploy invocation with interrupted cleanup could leak stale interpreter id + retired OAuth paths into the next invocation - sandbox-invocation-log.ts: dropped connection_revoked from ALLOWED_EXIT_STATUSES + added a regression test asserting the value is now rejected - sandbox_preamble.py: `if not installed()` → `if installed() is not True` (fail-closed against mocks returning truthy non-True). Dropped `from __future__ import annotations` (no-op after dataclass removal) and f-string with only module-constant interpolation - skill.yaml: rewrote the retired required_connections / OAuth preamble comment block to describe pure-compute + per-tenant IAM S3 access - SKILL.md: replaced non-existent SandboxDisabled error name with accurate description (dispatcher does not register the tool) Verification: - pnpm --filter @thinkwork/api typecheck clean - pnpm --filter @thinkwork/api test — 1051/1051 pass (+1 new guard) - pnpm --filter @thinkwork/docs build — 79 pages clean - Python inline regressions: warm-container stale-key clear; preamble identity check + no OAuth markers; tool docstring clean Residual findings (manual follow-up): - ADV-001: legacy-row hydration — handlers cast agent.sandbox without running through validateTemplateSandbox. Functionally safe (field never read) but plan comment overstates the strip behavior - KT-001: SandboxEnvironmentId hand-rolled duplicate of SandboxEnvironment from database-pg schema - W2: CAPABILITIES.md doesn't mention the sandbox / connector-skill pattern to the agent system prompt Review artifact: .context/compound-engineering/ce-review/20260423-194430-67ea771d/ Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
9 tasks
ericodom
added a commit
that referenced
this pull request
Apr 24, 2026
U4: add Astro redirects: { '/pricing': '/cloud' } in astro.config.mjs
and delete pricing.astro. Astro's static build emits /pricing/index.html
with <meta http-equiv='refresh'>, <link rel='canonical' href=.../cloud>,
and <meta name='robots' content='noindex'> — inbound links (search
results, Stripe cancels from older mobile builds, external shares)
continue to resolve, and the redirect stub is hidden from search
engines so duplicate-content risk is zero.
Update every known /pricing reference across the monorepo:
- apps/www/src/pages/m/checkout-complete.astro — mobile checkout
fallback link "Return to pricing" → "Return to plans" → /cloud.
- apps/www/src/env.d.ts — consumer comment refresh.
- packages/pricing-config/src/plans.ts — consumer comment refresh.
- apps/admin/src/routes/onboarding/welcome.tsx — hardcoded https URL
and visible anchor text "Return to pricing" → "Return to plans".
- apps/mobile/lib/stripe-checkout.ts — hardcoded cancelUrl to
https://thinkwork.ai/cloud. Older installed mobile builds still
point at /pricing and will hit the redirect; that's acceptable.
- terraform/modules/app/lambda-api/handlers.tf — STRIPE_CHECKOUT_
CANCEL_URL now ends in /cloud so canceled checkouts land directly
rather than bouncing through the redirect.
Leaves packages/api/src/handlers/stripe-checkout.ts telemetry string
'www-pricing' unchanged per plan Key Technical Decision #4 (analytics
continuity).
ericodom
added a commit
that referenced
this pull request
Apr 24, 2026
…te IA (#530) * docs(plan): www Cloud/Services IA refactor plan * feat(www): add /cloud route as copy of pricing page U1: stand up /cloud serving current pricing content. Both routes render identically at this point; subsequent units reframe /cloud, add Services cross-link, and retire /pricing via redirect. * feat(www): rename nav Pricing → Cloud U2: flip the visible nav entry to "Cloud" pointing at /cloud. Single edit in copy.ts propagates to desktop and mobile Header.astro renderings. * feat(www): reframe /cloud as ThinkWork Cloud + add Services cross-link U3: update hero/meta copy from generic "Pricing / Infrastructure you own" to "ThinkWork Cloud / Hosted agent infrastructure, deployed inside your AWS." Add a bulleted clarifier making scope explicit (hosted plans vs separate services vs separate AWS usage vs self-hosted via docs). Add a soft cross-link block between PricingGrid and FinalCTA pointing to /services — prose + inline text link, not a button, to keep the "soft pointer" posture asymmetric with the first-class Cloud Hosting card arriving on /services in U5. PricingGrid and the inline Stripe checkout script are unchanged — the new sections live outside the PricingGrid DOM so [data-plan-cta] / [data-plan-error] selectors continue to match. Also refresh the stale "Do NOT cross-link to /pricing" services-export comment now that the IA split makes the handoff expected in both directions. * refactor: retire /pricing route in favor of /cloud U4: add Astro redirects: { '/pricing': '/cloud' } in astro.config.mjs and delete pricing.astro. Astro's static build emits /pricing/index.html with <meta http-equiv='refresh'>, <link rel='canonical' href=.../cloud>, and <meta name='robots' content='noindex'> — inbound links (search results, Stripe cancels from older mobile builds, external shares) continue to resolve, and the redirect stub is hidden from search engines so duplicate-content risk is zero. Update every known /pricing reference across the monorepo: - apps/www/src/pages/m/checkout-complete.astro — mobile checkout fallback link "Return to pricing" → "Return to plans" → /cloud. - apps/www/src/env.d.ts — consumer comment refresh. - packages/pricing-config/src/plans.ts — consumer comment refresh. - apps/admin/src/routes/onboarding/welcome.tsx — hardcoded https URL and visible anchor text "Return to pricing" → "Return to plans". - apps/mobile/lib/stripe-checkout.ts — hardcoded cancelUrl to https://thinkwork.ai/cloud. Older installed mobile builds still point at /pricing and will hit the redirect; that's acceptable. - terraform/modules/app/lambda-api/handlers.tf — STRIPE_CHECKOUT_ CANCEL_URL now ends in /cloud so canceled checkouts land directly rather than bouncing through the redirect. Leaves packages/api/src/handlers/stripe-checkout.ts telemetry string 'www-pricing' unchanged per plan Key Technical Decision #4 (analytics continuity). * feat(www): consolidate /services into single card grid with Cloud Hosting handoff U5: remove featured/secondary variant split in ServiceCard and services.astro; every card now uses the same visual treatment. Drop the two separate SectionShells (featured grid + 'Additional packages') for a single consolidated section with 5 cards in one 3-column grid (md:grid-cols-2 lg:grid-cols-3): Strategy Sprint, Pilot Launch, Managed Operations, Workflow Expansion, and Cloud Hosting. Governance & Eval + AI Program Advisory are dropped as named cards — Governance substance folds into Managed Operations' includes list; Advisory is cut. Cloud Hosting is a new 5th peer card introducing ctaHref + ctaLabel fields on ServicePackage so it can deep-link visitors to /cloud; other cards leave those fields unset and render without per-card buttons (intake continues via the shared hero and closing CTAs). Preserves id="packages" on the consolidated SectionShell so the hero's #packages anchor link still resolves. Variant discriminator is removed entirely from the ServicePackage type. * refactor(www): correct Cloud positioning + layout polish across /cloud and header Fix the messaging on /cloud after live review: - ThinkWork Cloud is the FULLY-HOSTED product (we operate the platform end-to-end) for teams that don't want to run the Enterprise Agent Harness themselves. The prior framing ("deployed inside your AWS") is the self-managed Enterprise product, not Cloud. - Hero reframed: 'Fully managed AI agents, no infrastructure to run.' with a lede that names the Enterprise Agent Harness as the separate self-managed path. - smallPrint + finePrint block removed from PricingGrid (cards now flow straight into the Services cross-link); the remaining data stays in copy.ts for a potential checkout-page surface. - Cloud Hosting service card on /services reframed: 'Fully managed ThinkWork — no Agent Harness to run' with 'No AWS setup on your side' as its first include bullet. - Shared plan summaries in packages/pricing-config updated so Starter and Enterprise no longer mention deploying inside the customer's AWS (mobile onboarding surfaces these too). Add a /cloud-specific FinalCTA variant (finalCtaCloud in copy.ts) with 'Fully managed' eyebrow, 'Adopt AI. Skip the infrastructure.' headline, and matching points — swapped in via a new 'copy' prop on FinalCTA. Homepage continues to use the default 'Your AWS · Your rules' variant unchanged. Layout polish: - New 'tight' prop on FinalCTA drops its top border + gradient divider and reduces top padding, so /cloud flows cleanly out of the plans section. - RECOMMENDED badge on PricingCard uses opaque #070a0f (matches the page body) instead of bg-brand/10 so the card border no longer shows through the pill. - Services cross-link folded into the same SectionShell as the plans grid with tighter spacing, eliminating an entire empty section between the two. Header.astro now indicates the current route with semibold white + aria-current="page" on desktop and mobile nav; inactive items stay text-slate-400. Trailing slashes and subpaths are handled. * refactor(www): trim /services copy — drop lifecycle section and polish packaging U6: cleanup pass after the packaging consolidation in U5. - Remove the entire 'How it works / Engagement lifecycle' section (services.how in copy.ts + the 4-phase card grid in services.astro). With Strategy Sprint / Pilot Launch / Managed Operations / Workflow Expansion now as packages, the lifecycle narrates the same arc the cards already describe. One telling beats two. - Hero headlineOutcome matches the Services posture from the brief: 'We help teams scope, launch, operate, and expand governed AI workflows.' - Positioning headline de-arced: 'One partner from first workflow to ongoing operations.' Positioning body and meta description also strip the references to Governance & Program Advisory that were dropped in U5. - FAQ loses the 'What happens after launch?' entry — it named the removed packages and restated card content. - Managed Operations card outcome + bestFor trimmed to stop echoing the body. - Packages lede drops the 'Cloud Hosting sits alongside' explanation; the card speaks for itself. With this /services reads as: hero → proof band → positioning → 5-card packages grid → 5-question FAQ → closing CTA.
6 tasks
ericodom
added a commit
that referenced
this pull request
May 5, 2026
fix: skill_runner reads /tmp/skills (matches where install_skills writes)
ericodom
added a commit
that referenced
this pull request
May 5, 2026
…tion fix (#318) Closes handoff plan items #3 and #4.1 from plans/2026-04-20-006-handoff-cluster-enrichment-and-followups.md. ## #4.1 extractCityFromAddress dotted-abbreviation fix Spanish/Canadian/Australian addresses like "..., San Miguel de Allende, Gto., Mexico" previously produced candidate "Gto." (Guanajuato) because the region-code walk only recognized `^[A-Z]{2,4}(\s|$)` patterns. The audit showed 32 Marco records producing "Gto." and 22 producing "Q.R." (Quintana Roo) as candidate cities — both now correctly resolve to the preceding city slot. Fix: `isDottedRegionAbbr` recognizes `[A-Z][a-z]{0,2}\.` groups (up to 4 repetitions, ≤ 10 chars) and terminates the walk like the existing US-style region codes do. Record-expander probe on Marco confirms: candidate "San Miguel De Allende" (support=32) replaces "Gto." and the equivalent cities replace "Q.R." ## #3 summary-expander → deterministic linker wiring The audit showed `deriveParentCandidatesFromPageSummaries` produces 91 candidates on Marco (Toronto, Seattle, Honolulu hubs + many more) but the deterministic linker only consumed `deriveParentCandidates(records)` — so those candidates never became links. ### Type extension `DerivedParentCandidate` gets `sourceKind?: "record" | "summary"`: - record: `sourceRecordIds` are memory-record ids (existing path) - summary: `sourceRecordIds` are page ids (new path) Field is optional so existing test fixtures stay green; emitter defaults to "record" when omitted. ### Emitter update `emitDeterministicParentLinks` now builds two leaf indexes: - `leavesByRecord` — keyed on memory-record id (existing) - `leavesById` — keyed on page id, built from `[...scopePages, ...affectedPages]` (new) Summary-kind candidates route through `leavesById`. A new optional `scopePages` arg feeds the index so summary-kind candidates can resolve leaves that weren't touched THIS batch — necessary because summary-based candidates come from a scope-wide scan, not batch-local records. ### Compiler wiring `applyPlan` now calls BOTH expanders and passes merged candidates to the linker, along with `candidatePages` (already fetched for the planner call) as `scopePages`. Bypasses the merge-across-kinds ambiguity by concatenating both lists rather than merging — the emitter handles each candidate according to its kind. ### Precision filter tightening Summary-expander output adds `isLikelyCityToken` filter before pushing into the byCity map: - drops > 4-word fragments ("Prospect Interested In The Full PVL Product Line") - drops < 3-char tokens ("St") - drops street-suffix endings ("Congress Ave", "Queen St") These are the noise categories the 04-20 audit surfaced; real cities like "Buenos Aires" (2 words), "New York" (2 words), and "Montréal" (1 word with accent) all pass through. ## Expected impact On the next Marco recompile: `links_written_deterministic` should increase from the current 14/batch (record-only path) by 30-60 links (Toronto entity leaves → Toronto hub-if-exists-else-fuzzy-match, Seattle / Honolulu / etc.) — validated by unit tests; live-compile numbers land after deploy. Backfill dry-run on Marco: 22 parent links (up from 21), all spot- check precision-correct; wet run: 386 → 387 reference links (+1 net new, most attempts are idempotent re-writes of existing edges). ## Test coverage - 3 new dotted-abbreviation tests (Gto., Q.R., B.C.) - 5 new summary-expander filter tests (word cap, short, street suffix, sourceKind tagging for both expanders) - 5 new deterministic-linker tests for the summary-kind branch (scopePages leaf resolution, empty-pool skip, non-entity filter, back-compat when sourceKind omitted, both-kinds-fire for same parent) - Total: 471 passed / 8 skipped (up from 458). - Typecheck clean. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ericodom
added a commit
that referenced
this pull request
May 5, 2026
…ks dedup (#319) Captures the 2026-04-20 third-session end state. Supersedes plans/2026-04-20-006-handoff-cluster-enrichment-and-followups.md now that PRs #318 closed out items #3 (summary-expander wiring) and #4.1 (dotted-abbreviation fix). Four remaining items in priority order: 1. Validate #318 on Marco (TOP PRIORITY). PR promised links_written_deterministic jump of +30-60 on the next live compile, but the backfill can't exercise that path. Trigger a manual compile and verify the prediction before building more on top. 2. Unit 6 mention cluster enrichment. Biggest remaining plan item. Brief on the key design decision (mentionClusterEnrichments[] JSON shape — Option A separate rows recommended over Option B inline-promotions). 3. wikiBacklinks dedup. Warm-up PR (<1 hour). listBacklinks misses the dedup pattern that listConnectedPages already has. 4. Trivial grab-bag: wipeWikiScope FK dependency, applier-split debt (good to bundle with #2 since cluster promotion adds more lines to the already-1300-line applyAggregationPlan). Session tally: 6 PRs merged (#309, #311, #312, #316, #317, #318). Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ericodom
added a commit
that referenced
this pull request
May 5, 2026
Three drift incidents in five days (Apr 17 mig 0008, Apr 18 collision, Apr 21 0018+0019) had the same root cause and the same named-but- unshipped fix. Captures the pattern so the next author sees it before shipping migration #4. The drift-reporter fix shipped in PR #367; this is the learning that explains why the reporter exists and what marker convention every future unindexed migration must follow. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ericodom
added a commit
that referenced
this pull request
May 5, 2026
Resolves the P1 tension the handoff flagged: the plan honestly scoped R13 to 'no token via Python-stdio-mediated writes or known-shape CloudWatch patterns' and added T1b (intra-tenant template-author exfil), but the brainstorm still read absolutely. Three edits align the goalposts: 1. R13 rescoped to match the plan, with named residual coverage gaps (os.write at fd level, subprocess env dumps, C-extension writes, multiprocessing workers, adversarial split-writes) tracked as the Stdout-bypass class alongside T1. 2. Success Criterion #4 softened to match — 'within R13's scope.' 3. T1b added as a first-class residual threat between T1 and T2, with v1 mitigations (1-hour TTL, shared-template-author review as compensating control, tenant-as-trust-boundary) and v2 hardening track (per-user ABAC session tags or in-process credential proxy, latter preferred because it addresses T1 and T1b simultaneously).
ericodom
added a commit
that referenced
this pull request
May 5, 2026
…ation (Unit 6) (#430) ## createTenant wiring After INSERTing the tenant row, createTenant now invokes the agentcore-admin Lambda's /provision-tenant-sandbox route (plan Unit 5) via a new invokeProvisionTenantSandbox helper. The invoke uses InvocationType: RequestResponse per feedback_avoid_fire_and_forget_lambda_invokes so errors surface inside createTenant — but createTenant catches and logs them so a sandbox outage doesn't turn into a tenant-onboarding outage. The reconciler (Unit 6 follow-up) sweeps rows with null sandbox_interpreter_*_id at its own cadence. invokeProvisionTenantSandbox: - Reads AGENTCORE_ADMIN_LAMBDA_ARN + AGENTCORE_ADMIN_TOKEN env vars; throws SandboxProvisioningConfigError if missing (distinguishable so missing config is a warn, not an error). - Builds the API Gateway v2 envelope the handler is written for. - 45s abort signal matches the handler's own budget. - Translates statusCode 4xx → Error with server-side message; 2xx → structured ProvisionResult. ## updateTenantPolicy (new) Platform-operator-only mutation for sandbox_enabled + compliance_tier policy changes. Separate from updateTenant because the changes are security-boundary shifts and are audited in tenant_policy_events. Gate: caller's email must appear in the THINKWORK_PLATFORM_OPERATOR_EMAILS allowlist (comma-separated env var). This is the swap-out point for formal RBAC when it lands. Transition semantics encoded in a pure computeTransition helper (11 tests): - No-op when nothing changes. - sandbox_enabled true rejected when compliance_tier != standard. - compliance_tier → non-standard coerces sandbox_enabled off, producing a paired audit event so the transition is reproducible from the audit log alone. - tier-first ordering means 'enable sandbox AND set tier to hipaa in one call' deterministically rejects. Writes are wrapped in a db.transaction so the tenants UPDATE and the tenant_policy_events INSERT land atomically. ## GraphQL - Tenant type gains sandboxEnabled / complianceTier / sandboxInterpreter*Id - New UpdateTenantPolicyInput + updateTenantPolicy mutation - pnpm schema:build re-ran against the AppSync subscription schema ## Tests — 16 passing - sandbox-provisioning.test.ts (5) — envelope shape, 200/202/4xx parsing, Lambda FunctionError surfacing - updateTenantPolicy.test.ts (11) — every permutation of the transition decision tree: no-op, toggle true on standard, reject toggle true on regulated/hipaa, toggle false always ok, tier change coerces sandbox, composite tier+sandbox requests ## Deferred to follow-up - Reconciler Lambda (EventBridge scheduled fill + drift passes) — lands when the agentcore-admin Lambda terraform resource lands. Currently handled reactively: sandbox failures on createTenant are logged and the next successful createTenant-invoke retry picks up partial state. - SNS platform-security topic — per handoff P1 #4, dropped from v1 because no named subscriber / SLA exists. Audit row in tenant_policy_events provides detection; notification is v2. - Pre-existing tenants: a one-time operator action flips sandbox_enabled per-tenant during staged rollout (see plan Operational Notes).
ericodom
added a commit
that referenced
this pull request
May 5, 2026
…hon skill (#486) Parity pass with packages/skill-catalog/thinkwork-admin/scripts/operations/*.py. The @thinkwork/admin-ops package now exposes every op the Python skill ships, and the admin-ops MCP server registers all of them as MCP tools. Sets up deprecation of the skill: agents using mcp.thinkwork.ai can do everything the skill's Python wrappers did. Client - AdminOpsClient gains a `graphql(query, variables?)` helper that POSTs to /graphql with the same Bearer, throws AdminOpsError on error responses. Mutations + most reads use GraphQL (matches the Python skill's wire); tenants module continues to use REST since those handlers already exist. Ported modules (28 ops): - teams.ts 5 mutations + 2 reads (createTeam, add/remove team agents + users, listTeams, getTeam) - agents.ts 3 mutations + 3 reads (createAgent, setAgentSkills, setAgentCapabilities, listAgents, getAgent, listAllTenantAgents) - templates.ts 5 mutations + 3 reads (createAgentTemplate, createAgentFromTemplate, syncTemplateToAgent, syncTemplateToAllAgents, acceptTemplateUpdate, listTemplates, getTemplate, listLinkedAgentsForTemplate) - users.ts 0 mutations + 3 reads (me, getUser, listTenantMembers) - artifacts.ts 0 mutations + 2 reads (listArtifacts, getArtifact) - _fields.ts shared GraphQL field-selection constants mirroring reads.py MCP tool registration (packages/lambda/admin-ops-mcp.ts) - 25 new tools covering every ported op. Each carries a JSON Schema inputSchema + a non-empty description. Tenant pinning from the authenticated key overrides any caller-supplied tenantId on downstream calls. - The existing tools/list test asserts a curated must-have set rather than an exact-count equality, so future ports don't require test churn. Tests - 6 new tests in packages/admin-ops/src/teams.test.ts covering wire-format correctness (queries contain the right operation names, variables carry through, errors surface as AdminOpsError). - Full monorepo test run: 1270+ tests passing. Not in scope - CLI migration of `thinkwork team/agent/template/user/artifact` subcommands — deferred to a follow-up PR. - Removal of packages/skill-catalog/thinkwork-admin/ — deferred to PR #5 after seed (PR #4) promotes mcp.thinkwork.ai to tenants. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ericodom
added a commit
that referenced
this pull request
May 5, 2026
…ction preflight (#485) * refactor(sandbox): drop required_connections / OAuth preamble / connection preflight Ends the sandbox's OAuth-into-os.environ path end-to-end. Admin UI already stopped surfacing required_connections in #477; this closes the loop across validator, preflight, dispatcher, container, preamble, the pilot skill, concept doc, and runbook. Why now: the OAuth preamble was a live realization of the T1/T1b/T2 residual-threat classes the concept doc itself warns about. The v2 in-process credential proxy is the planned structural fix; landing that work cleanly requires this path gone. Agents that need OAuth-ed work (Slack, GitHub) call composable-skill connector scripts instead. What survives in the sandbox: - execute_code is a pure-compute primitive - The preamble still runs as executeCode call #1 — now a one-line sitecustomize readiness check that aborts the session if the stdio redactor didn't install (refuses to run user code on an unmitigated image). PREAMBLE_VERSION bumps to 2 - Preflight decision tree shrinks from 5 outcomes to 4; the missing-connection + ConnectionRevoked error classes are gone - Dispatcher payload drops sandbox_secret_paths / sandbox_tenant_id / sandbox_user_id / sandbox_stage; only sandbox_interpreter_id + sandbox_environment survive - packages/api/src/lib/sandbox-secrets.ts deleted outright - Validator rejects required_connections on write (not silently accepts) so operators can't reintroduce it via raw GraphQL - Hydration silently strips the key from legacy DB rows — no migration Docs: - Concept page loses the Preamble section, SandboxMissingConnection and ConnectionRevoked rows, required_connections YAML, and the T1 residual row. T1b dropped since the exfil class no longer exists. The 11-step per-turn lifecycle shrinks to 9 steps - Runbook loses failure modes #3 and #4; architecture-in-one-page simplified accordingly - sandbox-pilot SKILL.md rewritten to pure compute (S3 upload via per-tenant IAM role, no Slack post, no GitHub token) Verification: typecheck clean, 1050 api tests pass, docs site builds 79 pages clean, preamble emission verified via ast.parse + inline regression assertions (no boto3, no SecretString, no os.environ[...], no OAuth env-var names, no token prefixes). Plan: docs/plans/2026-04-23-006-refactor-sandbox-drop-required-connections-plan.md Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(sandbox): ce-review autofix sweep — tool docstring, e2e fixture, warm-container cleanup Applies ten safe_auto fixes surfaced by the ce:review pass across 9 reviewer personas. Highest-impact fixes: - sandbox_tool.py: rewrote the execute_code Strands tool docstring to describe the pure-compute primitive. Removed the retired OAuth env var claims (GITHUB_ACCESS_TOKEN, SLACK_ACCESS_TOKEN, GCAL_ACCESS_TOKEN) and ConnectionRevoked error — 7 reviewers flagged this; the LLM reads this docstring as live tool guidance - fixtures.ts: dropped required_connections from the e2e fixture's createAgentTemplate call (the validator now rejects it, every integration test would abort in setup). Removed vacuous syntheticTokens / seedConnections / putSecret plumbing that the token-leak assertion's forbiddenValues depended on - sandbox-pilot.e2e.test.ts: kept the structural CloudWatch token-leak check but dropped the synthetic forbiddenValues — no synthetic tokens are injected, so the structural prefix check (ghp_/xoxb-/ ya29./JWT) is the only meaningful guard - invocation_env.py: unconditional pop of all six SANDBOX_* keys at invocation entry. Closes the warm-container carryover window where a pre-deploy invocation with interrupted cleanup could leak stale interpreter id + retired OAuth paths into the next invocation - sandbox-invocation-log.ts: dropped connection_revoked from ALLOWED_EXIT_STATUSES + added a regression test asserting the value is now rejected - sandbox_preamble.py: `if not installed()` → `if installed() is not True` (fail-closed against mocks returning truthy non-True). Dropped `from __future__ import annotations` (no-op after dataclass removal) and f-string with only module-constant interpolation - skill.yaml: rewrote the retired required_connections / OAuth preamble comment block to describe pure-compute + per-tenant IAM S3 access - SKILL.md: replaced non-existent SandboxDisabled error name with accurate description (dispatcher does not register the tool) Verification: - pnpm --filter @thinkwork/api typecheck clean - pnpm --filter @thinkwork/api test — 1051/1051 pass (+1 new guard) - pnpm --filter @thinkwork/docs build — 79 pages clean - Python inline regressions: warm-container stale-key clear; preamble identity check + no OAuth markers; tool docstring clean Residual findings (manual follow-up): - ADV-001: legacy-row hydration — handlers cast agent.sandbox without running through validateTemplateSandbox. Functionally safe (field never read) but plan comment overstates the strip behavior - KT-001: SandboxEnvironmentId hand-rolled duplicate of SandboxEnvironment from database-pg schema - W2: CAPABILITIES.md doesn't mention the sandbox / connector-skill pattern to the agent system prompt Review artifact: .context/compound-engineering/ce-review/20260423-194430-67ea771d/ Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ericodom
added a commit
that referenced
this pull request
May 5, 2026
…te IA (#530) * docs(plan): www Cloud/Services IA refactor plan * feat(www): add /cloud route as copy of pricing page U1: stand up /cloud serving current pricing content. Both routes render identically at this point; subsequent units reframe /cloud, add Services cross-link, and retire /pricing via redirect. * feat(www): rename nav Pricing → Cloud U2: flip the visible nav entry to "Cloud" pointing at /cloud. Single edit in copy.ts propagates to desktop and mobile Header.astro renderings. * feat(www): reframe /cloud as ThinkWork Cloud + add Services cross-link U3: update hero/meta copy from generic "Pricing / Infrastructure you own" to "ThinkWork Cloud / Hosted agent infrastructure, deployed inside your AWS." Add a bulleted clarifier making scope explicit (hosted plans vs separate services vs separate AWS usage vs self-hosted via docs). Add a soft cross-link block between PricingGrid and FinalCTA pointing to /services — prose + inline text link, not a button, to keep the "soft pointer" posture asymmetric with the first-class Cloud Hosting card arriving on /services in U5. PricingGrid and the inline Stripe checkout script are unchanged — the new sections live outside the PricingGrid DOM so [data-plan-cta] / [data-plan-error] selectors continue to match. Also refresh the stale "Do NOT cross-link to /pricing" services-export comment now that the IA split makes the handoff expected in both directions. * refactor: retire /pricing route in favor of /cloud U4: add Astro redirects: { '/pricing': '/cloud' } in astro.config.mjs and delete pricing.astro. Astro's static build emits /pricing/index.html with <meta http-equiv='refresh'>, <link rel='canonical' href=.../cloud>, and <meta name='robots' content='noindex'> — inbound links (search results, Stripe cancels from older mobile builds, external shares) continue to resolve, and the redirect stub is hidden from search engines so duplicate-content risk is zero. Update every known /pricing reference across the monorepo: - apps/www/src/pages/m/checkout-complete.astro — mobile checkout fallback link "Return to pricing" → "Return to plans" → /cloud. - apps/www/src/env.d.ts — consumer comment refresh. - packages/pricing-config/src/plans.ts — consumer comment refresh. - apps/admin/src/routes/onboarding/welcome.tsx — hardcoded https URL and visible anchor text "Return to pricing" → "Return to plans". - apps/mobile/lib/stripe-checkout.ts — hardcoded cancelUrl to https://thinkwork.ai/cloud. Older installed mobile builds still point at /pricing and will hit the redirect; that's acceptable. - terraform/modules/app/lambda-api/handlers.tf — STRIPE_CHECKOUT_ CANCEL_URL now ends in /cloud so canceled checkouts land directly rather than bouncing through the redirect. Leaves packages/api/src/handlers/stripe-checkout.ts telemetry string 'www-pricing' unchanged per plan Key Technical Decision #4 (analytics continuity). * feat(www): consolidate /services into single card grid with Cloud Hosting handoff U5: remove featured/secondary variant split in ServiceCard and services.astro; every card now uses the same visual treatment. Drop the two separate SectionShells (featured grid + 'Additional packages') for a single consolidated section with 5 cards in one 3-column grid (md:grid-cols-2 lg:grid-cols-3): Strategy Sprint, Pilot Launch, Managed Operations, Workflow Expansion, and Cloud Hosting. Governance & Eval + AI Program Advisory are dropped as named cards — Governance substance folds into Managed Operations' includes list; Advisory is cut. Cloud Hosting is a new 5th peer card introducing ctaHref + ctaLabel fields on ServicePackage so it can deep-link visitors to /cloud; other cards leave those fields unset and render without per-card buttons (intake continues via the shared hero and closing CTAs). Preserves id="packages" on the consolidated SectionShell so the hero's #packages anchor link still resolves. Variant discriminator is removed entirely from the ServicePackage type. * refactor(www): correct Cloud positioning + layout polish across /cloud and header Fix the messaging on /cloud after live review: - ThinkWork Cloud is the FULLY-HOSTED product (we operate the platform end-to-end) for teams that don't want to run the Enterprise Agent Harness themselves. The prior framing ("deployed inside your AWS") is the self-managed Enterprise product, not Cloud. - Hero reframed: 'Fully managed AI agents, no infrastructure to run.' with a lede that names the Enterprise Agent Harness as the separate self-managed path. - smallPrint + finePrint block removed from PricingGrid (cards now flow straight into the Services cross-link); the remaining data stays in copy.ts for a potential checkout-page surface. - Cloud Hosting service card on /services reframed: 'Fully managed ThinkWork — no Agent Harness to run' with 'No AWS setup on your side' as its first include bullet. - Shared plan summaries in packages/pricing-config updated so Starter and Enterprise no longer mention deploying inside the customer's AWS (mobile onboarding surfaces these too). Add a /cloud-specific FinalCTA variant (finalCtaCloud in copy.ts) with 'Fully managed' eyebrow, 'Adopt AI. Skip the infrastructure.' headline, and matching points — swapped in via a new 'copy' prop on FinalCTA. Homepage continues to use the default 'Your AWS · Your rules' variant unchanged. Layout polish: - New 'tight' prop on FinalCTA drops its top border + gradient divider and reduces top padding, so /cloud flows cleanly out of the plans section. - RECOMMENDED badge on PricingCard uses opaque #070a0f (matches the page body) instead of bg-brand/10 so the card border no longer shows through the pill. - Services cross-link folded into the same SectionShell as the plans grid with tighter spacing, eliminating an entire empty section between the two. Header.astro now indicates the current route with semibold white + aria-current="page" on desktop and mobile nav; inactive items stay text-slate-400. Trailing slashes and subpaths are handled. * refactor(www): trim /services copy — drop lifecycle section and polish packaging U6: cleanup pass after the packaging consolidation in U5. - Remove the entire 'How it works / Engagement lifecycle' section (services.how in copy.ts + the 4-phase card grid in services.astro). With Strategy Sprint / Pilot Launch / Managed Operations / Workflow Expansion now as packages, the lifecycle narrates the same arc the cards already describe. One telling beats two. - Hero headlineOutcome matches the Services posture from the brief: 'We help teams scope, launch, operate, and expand governed AI workflows.' - Positioning headline de-arced: 'One partner from first workflow to ongoing operations.' Positioning body and meta description also strip the references to Governance & Program Advisory that were dropped in U5. - FAQ loses the 'What happens after launch?' entry — it named the removed packages and restated card content. - Managed Operations card outcome + bestFor trimmed to stop echoing the body. - Packages lede drops the 'Cloud Hosting sits alongside' explanation; the card speaks for itself. With this /services reads as: hero → proof band → positioning → 5-card packages grid → 5-question FAQ → closing CTA.
ericodom
added a commit
that referenced
this pull request
May 5, 2026
…ck (#296) Follow-up to #294 / #295. Marco bootstrap chain worked for the first hop but silently stopped at step 2: job #2 enqueued #3 successfully, but #3 (which hit max_new_pages again) didn't enqueue #4. Root cause: the continuation bucket was computed as `Date.now() + 300s`. When a chained job's own runtime exceeded its bucket length (job ran 112s but bucket is 300s; once runtime plus the offset crosses a bucket boundary, the computed "next" bucket equals the bucket the job itself is running in). The dedupe key collided with the child's own row, `ON CONFLICT DO NOTHING` swallowed the insert, and the chain died without a visible error. Anchor the offset on `job.created_at` instead. Each step in a chain now produces a strictly-monotonic bucket: parent in bucket N ⇒ child enqueued for bucket N+1, child in bucket N+1 ⇒ grandchild for N+2 — regardless of how long any step took to run. Dedupe collisions can only fire against external jobs (e.g., a memory-retain trigger that hit the same bucket), which is the correct behavior. Observed on dev: Marco job chain went 1→2, stopped. Post-fix, re-triggering will exercise the continuation through the full 261 memories until the cursor drains. No test regression (411/419, pre-existing 8 skips). Continuation behavior itself is covered by the Marco integration that spawned this fix.
This was referenced May 7, 2026
ericodom
added a commit
that referenced
this pull request
May 7, 2026
…ANT migration (#887) * docs(plans): add Phase 3 U2 focused execution overlay Focused execution overlay for U2 of the master Phase 3 plan: Aurora roles + Secrets Manager + GRANT migration. Splits the master plan's "RDS Proxy + separate endpoints" commitment into a follow-up unit (U12) since RDS Proxy is greenfield infrastructure with no existing precedent in the repo. Per-Lambda IAM scoping is documented as deferred (existing wildcard inherits the new compliance/* secrets — acceptable interim). Resolves all P0/P1 questions from the focused planning research: - Postgres role provisioning mechanism: hand-rolled SQL with operator- supplied passwords (matches U1 invariant; defers postgresql provider dependency). - Drift gate: extend scripts/db-migrate-manual.sh with probe_role for the new -- creates-role: marker type. - Secret naming: slash-delimited thinkwork/${stage}/compliance/* per CLAUDE.md standard. - Secret JSON shape: {username, password, host, port, dbname} (enriched vs the master's {username, password}). - Auto-supplied passwords: operator-supplied via env vars, not committed tfvars; Terraform owns the secret container, not the value. 4 sub-units: drift-gate extension, SQL migration, Terraform secrets, bootstrap script. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(compliance): Aurora roles + Secrets Manager containers (U2) Phase 3 U2 of the System Workflows revert. Provisions the three Aurora roles that scope per-tier access to compliance.* (introduced in U1 via PR #880) plus the Secrets Manager containers that hold their credentials. What's in this PR * `scripts/db-migrate-manual.sh` — adds `probe_role` and `creates-role:` marker support so hand-rolled migrations can declare cluster-global Postgres roles for the post-deploy drift gate. Mirrors the existing probe_extension shape (unqualified bare name, no schema prefix). * `packages/database-pg/drizzle/0070_compliance_aurora_roles.sql` — the hand-rolled migration. Idempotent DO $$ blocks check pg_roles before CREATE ROLE; ALTER ROLE on existing roles rotates the password. format(%L, ...) handles SQL-quoting for arbitrary password content. GRANT matrix matches Decision #4: - compliance_writer: USAGE on schema + INSERT only on audit_outbox and export_jobs. - compliance_drainer: USAGE + SELECT/UPDATE on audit_outbox + SELECT on actor_pseudonym + INSERT on audit_events. - compliance_reader: USAGE + SELECT only on all four compliance.* tables. * `packages/database-pg/__tests__/migration-0070.test.ts` — 27 vitest assertions: structural shape, creates-role marker presence, idempotent DO block pattern, psql variable substitution, GRANT matrix per role, format(%L) usage for SQL-injection safety. * `terraform/modules/data/aurora-postgres/main.tf` + `outputs.tf` — three `aws_secretsmanager_secret` containers at `thinkwork/${stage}/compliance/{writer,drainer,reader}-credentials` (slash-delimited per CLAUDE.md standard, enriched JSON shape) plus three new outputs. No `secret_version` resources — operator owns the value. * `scripts/bootstrap-compliance-roles.sh` — wraps the per-stage one-time bootstrap: resolves master DB credentials, generates or accepts role passwords from env, populates the three Secrets Manager secrets via put-secret-value, runs the migration with psql -v substitution, verifies via \du + drift gate. Idempotent re-run safe. Why this shape The master plan's Decision #4 commits to "two distinct DB users + separate RDS Proxy endpoints". Repo research surfaced two facts that shape U2 scope: 1. No RDS Proxy exists in the repo today. Lambdas connect direct. Introducing Proxy is greenfield infra (sub-module + IAM auth + SG + per-role endpoints) and deserves its own PR. 2. No precedent exists for in-Terraform Postgres role management (cyrilgdn/postgresql provider not used; no null_resource shell-out pattern). Hand-rolled SQL applied via psql -f to dev BEFORE merge is the U1 invariant. This PR ships the SOC2-required role separation. RDS Proxy moves to a follow-up unit (master plan's "Deferred to Follow-Up Work" gets a new U12 entry). Apply-before-merge Pre-merge operator workflow: STAGE=dev bash scripts/bootstrap-compliance-roles.sh The script populates the three secrets (Terraform-applied containers must already exist on dev — apply terraform-plan -target before the bootstrap script if greenfield) and runs the migration. Drift gate exits 0 against dev once roles + grants are in place. Tests: 112 vitest assertions across 9 files (was 85 in U1 / PR #880). Plan: docs/plans/2026-05-07-001-feat-compliance-u2-aurora-roles-plan.md Master: docs/plans/2026-05-06-011-feat-compliance-audit-event-log-plan.md Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(review): apply autofix feedback 8 safe_auto fixes from ce-code-review (run 20260507-043805-679c4050): scripts/bootstrap-compliance-roles.sh * P1 — Add STAGE allowlist gate. Require CONFIRM_NONDEV=1 for staging/prod to prevent accidental production credential rotation. (adversarial adv-003) * P1 — Stop leaking master DB password through python3 argv. Switch to stdin so the credential never appears in `ps aux` / /proc/<pid>/cmdline. (security SEC-002, correctness COR-002) * P1 — Stop emitting plaintext password values to stderr on auto- generation. Generated passwords are retrievable via aws secretsmanager get-secret-value; printing them to stderr persists them in CloudWatch / GHA logs / operator shell history. (security SEC-001) * P2 — Stop leaking compliance role passwords through psql -v argv. Replace -v writer_pass=... with a mktemp mode-0600 preamble file containing \set directives, consumed via psql -f. (security SEC-006, correctness COR-003, adversarial adv-005) * P2 — Stop leaking secret payloads through `--secret-string "$json"`. Use --secret-string file://$payload_file with a mktemp 0600 file + trap-clean. (security RR-002, adversarial adv-005) * P3 — Fix \du verification: psql ignores extra positional args, so the three-name form silently matches nothing. Use the glob \du compliance_* instead. (correctness COR-001, data-migrations DM-002) * CI-safety — Resolve role passwords with three-way precedence: env override → existing Secrets Manager value → auto-generate. Prevents password rotation on every deploy when the script runs in CI; only greenfield bootstrap auto-generates. packages/database-pg/drizzle/0070_compliance_aurora_roles.sql * P2 — Add pre-flight schema-existence check. CREATE ROLE is non- transactional in Postgres; without this guard, applying 0070 against a database without 0069 would persist three roles with passwords but zero grants, and the drift gate would report APPLIED. The pre-flight DO block raises EXCEPTION before any role creation, converting silent partial failure into a hard stop. (data-migrations DM-001) packages/database-pg/__tests__/migration-0070.test.ts * P3 — Add per-role CREATE+ALTER format(%L) assertions. The original `formatMatches.length >= 6` assertion would silently pass if a future edit dropped the ALTER branch on one role. The new it.each assertion verifies both branches exist for each role explicitly. Also added one pre-flight schema-check assertion for the migration above. (security TG-001, +1 test from data-migrations) Tests: 116/116 passing (was 112). Residual actionable work (6 items routed to PR-body residuals): #9 per-Lambda IAM scoping for the thinkwork/* secrets wildcard #10 hardcoded RDS DNS suffix in bootstrap (cross-stage promotion deferred) #11 update CLAUDE.md to enumerate creates-role: marker #12 rollback documentation for REVOKE ALL → DROP ROLE sequence #13 ALTER DEFAULT PRIVILEGES for future schema mutations #14 ordering of put-secret-value vs psql apply (accepted tradeoff) Run artifact: /tmp/compound-engineering/ce-code-review/20260507-043805-679c4050/ Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * ci: add compliance-bootstrap job to deploy.yml Phase 3 U2 chicken-egg: terraform-apply creates the three Secrets Manager containers (writer/drainer/reader), but only bootstrap-compliance-roles.sh can populate values + create matching Aurora roles + apply 0070_compliance_aurora_roles.sql. Without a CI step, migration-drift-check would report the three creates-role: markers as MISSING after every deploy until an operator manually ran the bootstrap. The script is idempotent across re-runs: reads existing Secrets Manager values and re-uses them on subsequent deploys (no password rotation), only auto-generates on greenfield. Safe to run on every deploy. New job `compliance-bootstrap`: * needs [terraform-apply], runs after containers exist. * Installs psql + jq, configures AWS creds, runs the bootstrap. * migration-drift-check now needs [terraform-apply, compliance-bootstrap] so the drift gate verifies the roles created here. Also added `?sslmode=require` to the bootstrap's DATABASE_URL for parity with migration-drift-check; psql in CI errors on missing TLS without it. Resolves the apply-before-merge gap surfaced in PR #887's body. After this commit lands, the deploy flow self-heals on first apply: terraform creates the secret containers (empty), bootstrap populates them + creates the roles, drift gate verifies, deploy continues. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced May 7, 2026
ericodom
added a commit
that referenced
this pull request
May 7, 2026
…e only) (#911) * feat(compliance): U6 — Strands runtime audit emit path (infrastructure only) Phase 3 U6 of the compliance audit-event log. U1-U5 shipped the schema, roles, write helper, outbox drainer, and TypeScript-runtime call-site emits. U6 stands up the cross-runtime path: a narrow REST endpoint the Python Strands runtime can post audit events to, with a Python client that mirrors the existing _log_invocation pattern. What ships: - POST /api/compliance/events — new compliance-events Lambda (packages/api/src/handlers/compliance.ts) authenticating via API_AUTH_SECRET. Cross-tenant guard via SELECT users.tenant_id; idempotency via SELECT-then-INSERT against client-supplied UUIDv7 event_id, with 23505-fallback for the SELECT/INSERT race. Emits through the U3 helper inside db.transaction. - emitAuditEvent helper extension — added optional eventId field on EmitAuditEventInput so the cross-runtime client can supply a UUIDv7 that survives retries. Existing U5 callers unchanged (don't pass eventId; helper still generates server-side). - Python ComplianceClient (packages/agentcore-strands/agent-container/container-sources/compliance_client.py) with stdlib UUIDv7 helper (RFC 9562), env snapshot at __init__, 3-attempt exponential backoff on 5xx/429, no retry on 4xx, snake_case → camelCase boundary conversion. - Boot-time client instantiation in server.py main() so a Phase 4 caller picks up the singleton via `from server import _compliance_client`. - Terraform + build-script wiring (mirrors narrow-handler shape, NOT compliance-outbox-drainer — drainer uses compliance_drainer role; this handler uses master DATABASE_SECRET_ARN like every other narrow handler). What does NOT ship: - No live `client.emit(...)` call sites in server.py. The only obvious candidate (Strands AGENTS.md edits) goes through /api/workspaces/files which already emits via U5's TypeScript path; emitting from Python on top would create duplicate audit rows. A guard test in test_compliance_client.py catches accidental scope creep. First non-duplicate caller is a Phase 4 brainstorm. Plan: docs/plans/2026-05-07-007-feat-compliance-u6-strands-emit-path-plan.md Tests: - 19 handler integration tests (mocked db) covering happy path, idempotency (replay + 23505 race), auth, cross-tenant guard, body validation, method/path matching. - 16 Python client tests covering UUIDv7 shape, env snapshot, enabled/disabled state, retry behavior on 5xx/429/timeouts, no-retry on 4xx, idempotency-key/body matching, no-live-emit guard. - 4 new emit.ts unit tests covering optional eventId override. - Full api suite passing: 2245/2245. - compliance-events Lambda zip builds (72K). Plan went through ce-doc-review headless; 3 P0s caught and resolved in the plan body before ce-work: 1. Original plan relied on onConflictDoNothing against a client-supplied event_id, but the U3 helper unconditionally generated server-side. Helper extended with optional eventId. 2. Original plan referenced compliance_writer Aurora role; U5 actually emits via the master db singleton. U6 follows U5's convention. 3. Original "first call site" choice produced duplicate rows with U5; reframed as infrastructure-only with the duplicate-row reasoning documented in the plan. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(review): apply autofix feedback Four safe_auto fixes from ce-code-review: 1. 23505 race-recovery now checks both `err.code` AND `err.cause.code` so the drizzle-orm-wrapped pg unique-violation actually gets caught. The handler's race-recovery comment claimed parity with the existing tasks.ts pattern but the implementation diverged — fixed. (correctness #1, P1) 2. Cross-tenant `users` SELECT now scoped to actorType === "user". System (e.g. "platform-credential") and agent (`agents.id`) actorIds are not user PKs; SELECTing users would have always 403'd them. API_AUTH_SECRET bearer is the trust boundary for non-user actors. (correctness #2, P2) 3. redactPayload errors mapped to 400 (alongside existing emitAuditEvent: prefix mapping). Posting a Phase 6 reservation eventType like `policy.evaluated` would have fallen through to 500, which the Python client treats as retryable, generating a 3× retry storm on a permanent caller bug. (security #2, P2) 4. Python retry shape clean-up. RETRY_DELAYS_SEC was (0.5, 1.0, 2.0) with a conditional that skipped the final sleep, making the 2.0s entry dead. Restructured to total_attempts = len(delays) + 1; N delays separate N+1 attempts. Trimmed to (0.5, 1.0) for 3 attempts with 1.5s sleep budget. Test monkeypatches updated. (correctness #4, P3) 3 new TypeScript regression tests cover the 23505 cause-chain path and the system / agent actorType cross-tenant-bypass branches. All 2248 TS + 16 Python tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced May 7, 2026
ericodom
added a commit
that referenced
this pull request
May 29, 2026
mapTurnsToUserMessages paired the i-th turn to the i-th user message by document position. In multi-player threads another human's message is a USER message that triggers no turn, so the agent's 'Working…' disclosure got pinned to that intervening message (rendering above the message that triggered it). Pair each turn to the nearest-preceding user message by timestamp instead; fall back to positional pairing when message createdAt is unavailable. Needs live multiplayer validation to confirm Image #4 is resolved.
ericodom
added a commit
that referenced
this pull request
May 29, 2026
* feat(spaces): commit mention with Tab, dismiss with Escape [U4] Both composers' mention autocomplete now commit the highlighted target on Tab as well as Enter, and close the menu on Escape without committing. The MentionMenu listbox gains aria-activedescendant so screen readers announce the active option during arrow navigation. * feat(spaces): add agent-mode derivation helper [U5] deriveAgentDefault() decides the composer agent toggle's initial state: ON in single-player threads, OFF when another human is in the loop (has posted, or is @mentioned in the current draft). Shared by both composers via a minimal local mention shape so the rule can't drift. * feat(spaces): auto-derive follow-up agent toggle default [U6] The follow-up composer's agent toggle now initializes from deriveAgentDefault instead of always-on: ON in single-player threads, OFF once another human has posted or is @mentioned in the draft. The manual choice persists within the thread (per-thread override) and re-derives on thread switch. * feat(spaces): auto-derive new-thread agent toggle default [U7] The new-thread composer's agent toggle now derives from draft mentions: ON with no user mention, OFF once another user is @mentioned. Manual choice persists until the draft is sent/cleared. Completes the single- vs multi-player default for Issue 4. * feat(spaces): surface tagged threads in the sidebar live + on focus [U1][U2] The urql document cache doesn't auto-invalidate on live events, so a thread the caller was @mentioned into didn't appear without a manual refresh. The shell now refetches the thread lists (coalesced, network-only) on two signals: returning to the window (focus/visibility) and the tenant-scoped onThreadUpdated subscription, which fires for the caller on createThread and sendMessage. No new subscription field — reuses the existing event. Needs live desktop validation: confirms the root cause of the 'even after refresh' divergence (R1.2). * fix(spaces): anchor turn activity to its triggering message [U3] mapTurnsToUserMessages paired the i-th turn to the i-th user message by document position. In multi-player threads another human's message is a USER message that triggers no turn, so the agent's 'Working…' disclosure got pinned to that intervening message (rendering above the message that triggered it). Pair each turn to the nearest-preceding user message by timestamp instead; fall back to positional pairing when message createdAt is unavailable. Needs live multiplayer validation to confirm Image #4 is resolved. * style(spaces): prettier formatting for agent-mode + composer [U5][U7] * fix(api): mention participants can see threads in private spaces [U1] callerVisibleThreadPredicate gated thread visibility on author-or-participant AND space-membership. A user mentioned into a thread inside a private Space they don't belong to is a participant but not a member, so the thread was filtered out of their list — the desktop refetch couldn't surface what the query excluded. A mention is a thread-level invite: an explicit participant now bypasses the space-membership gate for THAT thread only (they still can't see the rest of the private Space). Realigns the code with its own docstring. * test(spaces): cover turn-pairing edge cases [U3] Add coverage for multiple turns mapping to one user message (latest wins, single disclosure) and a turn with no preceding user message (anchors to the earliest message without throwing).
ericodom
added a commit
that referenced
this pull request
May 31, 2026
…hread counts; add scoped Thread List table (#1894) Thread-detail / sidebar fixes: 1. Follow-up flash (#4): withTurnResponseFallback only tail-appended the latest completed turn's synthetic response, so the prior turn's (not yet durable) response was dropped the instant an optimistic follow-up user message arrived — the transcript flashed the answer out and showed only "Working…". Reconstruct a synthetic response after EACH completed turn's user message that lacks a durable reply, anchored in place across new turns. 2. Unread filter hides the open thread (#1): clicking a thread in a filtered section marked it locally-read and filtered it out the same frame. Add displayedUnreadThreads() to retain the selected thread in the displayed list (badge / mark-all still use the pure unread set); it drops out on deselect. 3. Space nav section undercount (#3): each space section bucketed the tenant-wide RECENT_LIMIT window client-side, so busy tenants starved a space's section vs its detail page. Fetch each space's own scoped threads (SpaceThreadsQuery, mirroring the detail page) and merge the bucketed seed. 4. Thread List table (#2): each section's "…" menu gains a "Thread list" item that opens /threads scoped to that section's space (Chats → default space, matching the nav). Server-paginated DataTable (table-fixed, no horizontal scroll, height-constrained so the pager stays visible) with Title, ID, Space, Status, Last activity columns, row-click-to-open, and delete. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Test plan
Depends on the merge of #3.