feat(mcp): per-tenant admin-ops MCP Bearer tokens + CLI management#482
Merged
Conversation
Replaces the shared API_AUTH_SECRET gate on the admin-ops MCP Lambda (POST /mcp/admin, introduced in #480) with per-tenant Bearer tokens stored as SHA-256 hashes. Each incoming token is looked up in tenant_mcp_admin_keys; a match pins tenantId on the downstream @thinkwork/admin-ops client so any caller-supplied tenantId is overridden. API_AUTH_SECRET is retained as a break-glass superuser path for bootstrap/debug, with a log on every fallback. Database - New hand-rolled migration drizzle/0024_tenant_mcp_admin_keys.sql with the standard manual-migration header + -- creates: markers. Partial unique index on (tenant_id, name) WHERE revoked_at IS NULL lets operators recreate a revoked "default" name. - Drizzle schema at packages/database-pg/src/schema/mcp-admin-keys.ts. REST (packages/api/src/handlers/mcp-admin-keys.ts) - POST /api/tenants/:tenantId/mcp-admin-keys — issue, returns raw token ONCE. Token format tkm_<32B base64url>; server stores only the hash. - GET /api/tenants/:tenantId/mcp-admin-keys — list metadata. - DELETE /api/tenants/:tenantId/mcp-admin-keys/:keyId — soft-delete. - Bootstrap auth via validateApiSecret (same as sandbox-quota-check etc.); Cognito-aware auth will land with the admin-SPA UI. CLI (extends apps/cli/src/commands/mcp.ts) - thinkwork mcp key create [-t tenant --name label] - thinkwork mcp key list [-t tenant --all] - thinkwork mcp key revoke <id> [-t tenant] Client (packages/admin-ops/src/admin-keys.ts) - createAdminKey / listAdminKeys / revokeAdminKey exported as admin-ops/admin-keys; deliberately NOT registered as MCP tools (would be a trivial privilege escalation vector). Lambda auth swap (packages/lambda/admin-ops-mcp.ts) - async authenticate() hashes the Bearer + looks up in tenant_mcp_admin_keys; falls through to API_AUTH_SECRET on miss or DB error; returns AuthResult { tenantId, keyId, superuser }. - buildTools(auth) pins auth.tenantId on every downstream REST call; superuser falls back to caller-supplied arg.tenantId. - Best-effort last_used_at bump on success (async, never blocks auth). Bug fix - Moved packages/lambda/admin-ops-mcp.test.ts → __tests__/ so vitest actually collects it. The tests from #480 existed but were not running because the lambda vitest config only includes __tests__/**/*.test.ts. Tests - 5 new tests in packages/api/src/handlers/mcp-admin-keys.test.ts (token entropy + format, hash determinism, case-sensitivity, hash equivalence). - 17 tests in packages/lambda/__tests__/admin-ops-mcp.test.ts — 10 from #480 now actually run + 7 new (tenant-key match, superuser fallback, DB-outage fallthrough, non-superuser rejection on DB failure, pinned tenantId override, superuser tenantId passthrough, token-hash collision rejection). - All monorepo test suites pass (1270+ tests). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
10 tasks
ericodom
added a commit
that referenced
this pull request
Apr 24, 2026
… + Cloudflare sync script (#483) Publishes the admin-ops MCP Lambda at a stable Cloudflare-managed hostname (e.g., mcp.thinkwork.ai) instead of the execute-api URL. Safe to ship because PR #482 already moved the MCP to per-tenant Bearer tokens — the public URL is not protected by a shared secret. Terraform (terraform/modules/app/lambda-api/main.tf) - aws_acm_certificate.mcp — gated on var.mcp_custom_domain. - aws_apigatewayv2_domain_name.mcp + api_mapping.mcp — gated on BOTH var.mcp_custom_domain and var.mcp_custom_domain_ready. The second flag is the explicit two-apply toggle because ACM validates via DNS and API Gateway v2 refuses an unvalidated cert. - New outputs: mcp_custom_domain, mcp_custom_domain_cert_arn, mcp_custom_domain_validation, mcp_custom_domain_target. Plumbed through terraform/modules/thinkwork/{main,variables,outputs}.tf. - Same HTTP API serves both /graphql and /mcp/admin. Strict route isolation (second API for MCP-only) is a future option; not needed for v1 since auth gates access at the handler level. Cloudflare sync (scripts/cloudflare-sync-mcp.ts, pnpm cf:sync-mcp) - Pure-fetch against Cloudflare v4 API; no new npm deps. - Reads CLOUDFLARE_API_TOKEN from env — never persisted to disk. - `terraform output -json` → upsert plan → apply. - Idempotent: existing records get PUT, missing get POST, matching get NOOP. --verify-only flag for dry-run. - --finalize adds the production mcp.<domain> → API GW CNAME after the second terraform apply. Runbook (docs/solutions/patterns/mcp-custom-domain-setup-2026-04-23.md) - Two-apply workflow documented step-by-step. - Rollback path and token-hygiene notes included. - Smoke test curl at the end. Rationale for two applies - aws_acm_certificate_validation would block the apply for ~5 min while ACM polls for DNS — and fails if the records aren't in CF yet. The two-flag split keeps each apply fast and makes the dependency on out-of-band DNS explicit rather than hidden in a long-polling resource. - Alternative would be adding the cloudflare Terraform provider and managing records in the same plan. Bigger scope, new provider auth to configure, chose the simpler path for v1. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
6 tasks
ericodom
added a commit
that referenced
this pull request
Apr 24, 2026
…d) (#487) Adds the missing piece between PR #482 (per-tenant keys) and #5 (skill deprecation): a single call that preps a tenant to consume the admin-ops MCP. After `thinkwork mcp provision -t <slug>`, the runtime picks up the admin-ops server for any agent that gets it assigned via agent_mcp_servers (admin SPA / future CLI command). Handler — packages/api/src/handlers/mcp-admin-provision.ts - POST /api/tenants/:tenantId/mcp-admin-provision - Three steps in one call: 1. Mint a fresh tkm_ token via the existing mcp-admin-keys helpers (generateToken/hashToken from #482). Insert into tenant_mcp_admin_keys. 2. Store raw token in Secrets Manager at `thinkwork/<stage>/mcp/<tenantId>/admin-ops`, matching the convention skills.ts established for tenant_api_key secrets. 3. Upsert tenant_mcp_servers (slug="admin-ops", auth_type="tenant_api_key", auth_config={secretRef, token}). Duplicates the raw token into auth_config.token to match mcp-configs.ts's current reader — a secretRef-only migration is a separate pass. - Idempotent: re-running revokes the previous active admin-ops key for this tenant and rotates the secret. - Default URL resolves to MCP_CUSTOM_DOMAIN ?? THINKWORK_API_URL + /mcp/admin. `body.url` overrides. - Bootstrap auth via validateApiSecret (matches mcp-admin-keys, sandbox-quota-check, other service endpoints). Terraform — terraform/modules/app/lambda-api/handlers.tf - New handler registered in the for_each map. - Route: POST /api/tenants/{tenantId}/mcp-admin-provision. - No new IAM: secretsmanager:CreateSecret/UpdateSecret/GetSecretValue is already granted on thinkwork/* by aws_iam_role_policy. lambda_secrets in main.tf. CLI — apps/cli/src/commands/mcp.ts - thinkwork mcp provision [-t <slug>] [--url <url>] [--all] - --all enumerates /api/tenants and iterates; partial failures surface non-zero exit. - Raw token is never printed — it goes into SM + DB and stays there. To get a human-usable token for debugging, use `thinkwork mcp key create` (which returns it once). Tests - 4 unit tests for URL-resolution contract in mcp-admin-provision.test.ts. - Existing admin-ops-mcp + mcp-admin-keys suites stay green. - Full monorepo: 1274+ tests passing. - Terraform validate: Success. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Merged
5 tasks
ericodom
added a commit
that referenced
this pull request
Apr 24, 2026
…ces the Python skill (#488) Closes the deprecation arc that started with PR #480 (admin-ops package + MCP Lambda) and #482 (per-tenant Bearer keys). Every op the Python skill shipped is now callable via the admin-ops MCP server (#486 ported the full set); #487 provisions tenants. The skill's scripts are redundant. What this PR does - Delete packages/skill-catalog/thinkwork-admin/ — the whole directory: SKILL.md, skill.yaml, scripts/, tests/. 4,256 lines removed. - Delete packages/api/src/__tests__/thinkwork-admin-e2e-smoke.test.ts — exercised the createAgent resolver via the Python skill's flow; the resolver itself is still covered by agents-authz.test.ts + set-agent-skills-subset.test.ts. - Trim packages/api/src/__tests__/never-exposed-tier.test.ts — drops the skill.yaml-regex catastrophic-op-exclusion block (the skill is gone). The `requireNotFromAdminSkill` contract tests stay — that guard applies to every non-Cognito path (peer skills + agent broker + future integrations), not just the retired skill. - Add docs/solutions/patterns/retire-thinkwork-admin-skill-2026-04-24.md — full runbook including pre-merge SQL for disabling any agent_skills rows that still reference the skill. What this PR keeps - All defensive primitives (requireNotFromAdminSkill, requireAdminOrApiKeyCaller, requireAgentAllowsOperation, adminRoleCheck query) — useful for peer skills + future broker work, not skill-specific. - Historical migrations (drizzle/0020, drizzle/0022) — they were applied to prod; the file artifacts stay for audit. - Resolver comments mentioning the skill as historical context — accurate descriptions of why a guard exists. Operator prereq (documented in the runbook) Before merging, run in each stage: thinkwork mcp provision --all -s <stage> Then apply the retire SQL from the runbook to disable any lingering agent_skills rows. The runtime degrades gracefully if it encounters the deleted skill (skill_runner logs and skips), but the SQL makes the deprecation explicit + auditable. Tests - 1057 api tests, 65 lambda, 124 cli, 17 admin-ops, etc. — all green. - Terraform validate passes. - Python key files (server.py, skill_runner.py, the two test files that mention 'thinkwork-admin' in docstrings) parse clean. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ericodom
added a commit
that referenced
this pull request
May 5, 2026
…#482) Replaces the shared API_AUTH_SECRET gate on the admin-ops MCP Lambda (POST /mcp/admin, introduced in #480) with per-tenant Bearer tokens stored as SHA-256 hashes. Each incoming token is looked up in tenant_mcp_admin_keys; a match pins tenantId on the downstream @thinkwork/admin-ops client so any caller-supplied tenantId is overridden. API_AUTH_SECRET is retained as a break-glass superuser path for bootstrap/debug, with a log on every fallback. Database - New hand-rolled migration drizzle/0024_tenant_mcp_admin_keys.sql with the standard manual-migration header + -- creates: markers. Partial unique index on (tenant_id, name) WHERE revoked_at IS NULL lets operators recreate a revoked "default" name. - Drizzle schema at packages/database-pg/src/schema/mcp-admin-keys.ts. REST (packages/api/src/handlers/mcp-admin-keys.ts) - POST /api/tenants/:tenantId/mcp-admin-keys — issue, returns raw token ONCE. Token format tkm_<32B base64url>; server stores only the hash. - GET /api/tenants/:tenantId/mcp-admin-keys — list metadata. - DELETE /api/tenants/:tenantId/mcp-admin-keys/:keyId — soft-delete. - Bootstrap auth via validateApiSecret (same as sandbox-quota-check etc.); Cognito-aware auth will land with the admin-SPA UI. CLI (extends apps/cli/src/commands/mcp.ts) - thinkwork mcp key create [-t tenant --name label] - thinkwork mcp key list [-t tenant --all] - thinkwork mcp key revoke <id> [-t tenant] Client (packages/admin-ops/src/admin-keys.ts) - createAdminKey / listAdminKeys / revokeAdminKey exported as admin-ops/admin-keys; deliberately NOT registered as MCP tools (would be a trivial privilege escalation vector). Lambda auth swap (packages/lambda/admin-ops-mcp.ts) - async authenticate() hashes the Bearer + looks up in tenant_mcp_admin_keys; falls through to API_AUTH_SECRET on miss or DB error; returns AuthResult { tenantId, keyId, superuser }. - buildTools(auth) pins auth.tenantId on every downstream REST call; superuser falls back to caller-supplied arg.tenantId. - Best-effort last_used_at bump on success (async, never blocks auth). Bug fix - Moved packages/lambda/admin-ops-mcp.test.ts → __tests__/ so vitest actually collects it. The tests from #480 existed but were not running because the lambda vitest config only includes __tests__/**/*.test.ts. Tests - 5 new tests in packages/api/src/handlers/mcp-admin-keys.test.ts (token entropy + format, hash determinism, case-sensitivity, hash equivalence). - 17 tests in packages/lambda/__tests__/admin-ops-mcp.test.ts — 10 from #480 now actually run + 7 new (tenant-key match, superuser fallback, DB-outage fallthrough, non-superuser rejection on DB failure, pinned tenantId override, superuser tenantId passthrough, token-hash collision rejection). - All monorepo test suites pass (1270+ tests). Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ericodom
added a commit
that referenced
this pull request
May 5, 2026
… + Cloudflare sync script (#483) Publishes the admin-ops MCP Lambda at a stable Cloudflare-managed hostname (e.g., mcp.thinkwork.ai) instead of the execute-api URL. Safe to ship because PR #482 already moved the MCP to per-tenant Bearer tokens — the public URL is not protected by a shared secret. Terraform (terraform/modules/app/lambda-api/main.tf) - aws_acm_certificate.mcp — gated on var.mcp_custom_domain. - aws_apigatewayv2_domain_name.mcp + api_mapping.mcp — gated on BOTH var.mcp_custom_domain and var.mcp_custom_domain_ready. The second flag is the explicit two-apply toggle because ACM validates via DNS and API Gateway v2 refuses an unvalidated cert. - New outputs: mcp_custom_domain, mcp_custom_domain_cert_arn, mcp_custom_domain_validation, mcp_custom_domain_target. Plumbed through terraform/modules/thinkwork/{main,variables,outputs}.tf. - Same HTTP API serves both /graphql and /mcp/admin. Strict route isolation (second API for MCP-only) is a future option; not needed for v1 since auth gates access at the handler level. Cloudflare sync (scripts/cloudflare-sync-mcp.ts, pnpm cf:sync-mcp) - Pure-fetch against Cloudflare v4 API; no new npm deps. - Reads CLOUDFLARE_API_TOKEN from env — never persisted to disk. - `terraform output -json` → upsert plan → apply. - Idempotent: existing records get PUT, missing get POST, matching get NOOP. --verify-only flag for dry-run. - --finalize adds the production mcp.<domain> → API GW CNAME after the second terraform apply. Runbook (docs/solutions/patterns/mcp-custom-domain-setup-2026-04-23.md) - Two-apply workflow documented step-by-step. - Rollback path and token-hygiene notes included. - Smoke test curl at the end. Rationale for two applies - aws_acm_certificate_validation would block the apply for ~5 min while ACM polls for DNS — and fails if the records aren't in CF yet. The two-flag split keeps each apply fast and makes the dependency on out-of-band DNS explicit rather than hidden in a long-polling resource. - Alternative would be adding the cloudflare Terraform provider and managing records in the same plan. Bigger scope, new provider auth to configure, chose the simpler path for v1. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ericodom
added a commit
that referenced
this pull request
May 5, 2026
…d) (#487) Adds the missing piece between PR #482 (per-tenant keys) and #5 (skill deprecation): a single call that preps a tenant to consume the admin-ops MCP. After `thinkwork mcp provision -t <slug>`, the runtime picks up the admin-ops server for any agent that gets it assigned via agent_mcp_servers (admin SPA / future CLI command). Handler — packages/api/src/handlers/mcp-admin-provision.ts - POST /api/tenants/:tenantId/mcp-admin-provision - Three steps in one call: 1. Mint a fresh tkm_ token via the existing mcp-admin-keys helpers (generateToken/hashToken from #482). Insert into tenant_mcp_admin_keys. 2. Store raw token in Secrets Manager at `thinkwork/<stage>/mcp/<tenantId>/admin-ops`, matching the convention skills.ts established for tenant_api_key secrets. 3. Upsert tenant_mcp_servers (slug="admin-ops", auth_type="tenant_api_key", auth_config={secretRef, token}). Duplicates the raw token into auth_config.token to match mcp-configs.ts's current reader — a secretRef-only migration is a separate pass. - Idempotent: re-running revokes the previous active admin-ops key for this tenant and rotates the secret. - Default URL resolves to MCP_CUSTOM_DOMAIN ?? THINKWORK_API_URL + /mcp/admin. `body.url` overrides. - Bootstrap auth via validateApiSecret (matches mcp-admin-keys, sandbox-quota-check, other service endpoints). Terraform — terraform/modules/app/lambda-api/handlers.tf - New handler registered in the for_each map. - Route: POST /api/tenants/{tenantId}/mcp-admin-provision. - No new IAM: secretsmanager:CreateSecret/UpdateSecret/GetSecretValue is already granted on thinkwork/* by aws_iam_role_policy. lambda_secrets in main.tf. CLI — apps/cli/src/commands/mcp.ts - thinkwork mcp provision [-t <slug>] [--url <url>] [--all] - --all enumerates /api/tenants and iterates; partial failures surface non-zero exit. - Raw token is never printed — it goes into SM + DB and stays there. To get a human-usable token for debugging, use `thinkwork mcp key create` (which returns it once). Tests - 4 unit tests for URL-resolution contract in mcp-admin-provision.test.ts. - Existing admin-ops-mcp + mcp-admin-keys suites stay green. - Full monorepo: 1274+ tests passing. - Terraform validate: Success. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ericodom
added a commit
that referenced
this pull request
May 5, 2026
…ces the Python skill (#488) Closes the deprecation arc that started with PR #480 (admin-ops package + MCP Lambda) and #482 (per-tenant Bearer keys). Every op the Python skill shipped is now callable via the admin-ops MCP server (#486 ported the full set); #487 provisions tenants. The skill's scripts are redundant. What this PR does - Delete packages/skill-catalog/thinkwork-admin/ — the whole directory: SKILL.md, skill.yaml, scripts/, tests/. 4,256 lines removed. - Delete packages/api/src/__tests__/thinkwork-admin-e2e-smoke.test.ts — exercised the createAgent resolver via the Python skill's flow; the resolver itself is still covered by agents-authz.test.ts + set-agent-skills-subset.test.ts. - Trim packages/api/src/__tests__/never-exposed-tier.test.ts — drops the skill.yaml-regex catastrophic-op-exclusion block (the skill is gone). The `requireNotFromAdminSkill` contract tests stay — that guard applies to every non-Cognito path (peer skills + agent broker + future integrations), not just the retired skill. - Add docs/solutions/patterns/retire-thinkwork-admin-skill-2026-04-24.md — full runbook including pre-merge SQL for disabling any agent_skills rows that still reference the skill. What this PR keeps - All defensive primitives (requireNotFromAdminSkill, requireAdminOrApiKeyCaller, requireAgentAllowsOperation, adminRoleCheck query) — useful for peer skills + future broker work, not skill-specific. - Historical migrations (drizzle/0020, drizzle/0022) — they were applied to prod; the file artifacts stay for audit. - Resolver comments mentioning the skill as historical context — accurate descriptions of why a guard exists. Operator prereq (documented in the runbook) Before merging, run in each stage: thinkwork mcp provision --all -s <stage> Then apply the retire SQL from the runbook to disable any lingering agent_skills rows. The runtime degrades gracefully if it encounters the deleted skill (skill_runner logs and skips), but the SQL makes the deprecation explicit + auditable. Tests - 1057 api tests, 65 lambda, 124 cli, 17 admin-ops, etc. — all green. - Terraform validate passes. - Python key files (server.py, skill_runner.py, the two test files that mention 'thinkwork-admin' in docstrings) parse clean. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Replaces the shared
API_AUTH_SECRETgate on the admin-ops MCP Lambda (introduced in #480) with per-tenant Bearer tokens. Prereq for publishingmcp.thinkwork.ai— without tenant-scoped auth, a public URL protected only by the shared secret is a single leak away from cross-tenant admin impersonation.tenant_mcp_admin_keys(hand-rolled migration0024_tenant_mcp_admin_keys.sql) stores SHA-256 hashes. Raw token is shown exactly once at creation; formattkm_<32B base64url>./api/tenants/:tenantId/mcp-admin-keys— POST/GET/DELETE. Bootstrap auth viavalidateApiSecret; Cognito-aware auth lands with the future admin-SPA UI.thinkwork mcp key {create,list,revoke}— extendsapps/cli/src/commands/mcp.ts. Creation prints the raw token once in cyan; list shows metadata only.async authenticate()hashes the Bearer and looks up intenant_mcp_admin_keys. Match → tenant-pinned (auth.tenantIdoverrides any caller-suppliedtenantIdin tool args). Miss or DB error → falls through toAPI_AUTH_SECRETas break-glass superuser, with a warning log.@thinkwork/admin-ops/admin-keysexports typed client functions. Deliberately not registered as MCP tools — key creation via an agent-facing MCP surface would be a privilege escalation path.Bug fix (bonus)
The
admin-ops-mcp.test.tstests from #480 were never running — vitest'sincludepattern inpackages/lambda/vitest.config.tsonly matches__tests__/**/*.test.tsand the file lived at the package root. Moved topackages/lambda/__tests__/, now collected.Test plan
pnpm -r typecheck— all affected packages clean (database-pg,admin-ops,api,lambda,apps/cli).packages/agent-toolstypecheck fails with "tsc: command not found" — pre-existing, unrelated.pnpm -r test— 1270+ tests pass, including:packages/api/src/handlers/mcp-admin-keys.test.ts(token entropy + format, hash determinism, case-sensitivity)packages/lambda/__tests__/admin-ops-mcp.test.ts(10 from feat(admin-ops): shared admin-ops package + MCP server + CLI migration #480 that weren't previously running + 7 new auth-flow tests: tenant-key match, superuser fallback, DB-outage fallthrough, non-superuser rejection on DB failure, pinned-tenantId override of caller spoofing, superuser passthrough of caller tenantId)drizzle/0024_tenant_mcp_admin_keys.sqlto dev viapsql "$DATABASE_URL" -f packages/database-pg/drizzle/0024_tenant_mcp_admin_keys.sql(the deploy.yml gate runspnpm db:migrate-manualand will fail the deploy if this isn't applied).thinkwork mcp key create -t <slug>— capture thetkm_...tokencurl -X POST -H "Authorization: Bearer tkm_..." -H "Content-Type: application/json" -d '{"jsonrpc":"2.0","id":1,"method":"tools/list"}' https://<api>/mcp/adminthinkwork mcp key revoke <id>→ subsequent POST returns 401API_AUTH_SECRETin Secrets Manager at some point (optional for this PR; the secret remains valid as break-glass).Next up
mcp.thinkwork.ai— ACM cert +aws_apigatewayv2_domain_name+ Cloudflare DNS. Safe to publish once this merges because the URL now has per-tenant auth.packages/skill-catalog/thinkwork-admin/scripts/operations/*.pyto@thinkwork/admin-ops.🤖 Generated with Claude Code