Skip to content

feat(admin-ops): shared admin-ops package + MCP server + CLI migration#480

Merged
ericodom merged 1 commit into
mainfrom
feat/admin-ops-mcp
Apr 23, 2026
Merged

feat(admin-ops): shared admin-ops package + MCP server + CLI migration#480
ericodom merged 1 commit into
mainfrom
feat/admin-ops-mcp

Conversation

@ericodom
Copy link
Copy Markdown
Contributor

Summary

Ships the thing the original admin-skill-MCP feature request actually asked for, sized at ~1 PR instead of the previously-planned 12-16 week pivot:

  • packages/admin-ops/ — new shared TypeScript package exposing typed functions over the existing REST API. Both the CLI and the new MCP server consume it. Adding a new op lands in one module; each consumer picks it up in a 5-10 line change.
  • packages/lambda/admin-ops-mcp.ts — new Lambda speaking stateless JSON-RPC MCP (protocol 2024-11-05) over POST /mcp/admin. Hand-rolled (no heavy SDK). Tools registered: tenants_list, tenants_get, tenants_update. Auth reuses the existing Bearer API_AUTH_SECRET the rest of the REST surface accepts.
  • CLI migrationthinkwork tenant list and thinkwork tenant get now real, backed by @thinkwork/admin-ops. Other tenant subcommands stay notYetImplemented pending ops they need.
  • Terraform registers the new Lambda + POST /mcp/admin route; scripts/build-lambdas.sh bundles it alongside the rest.

Why not the previously-drafted OpenAPI pivot?

The prior plan (docs/plans/2026-04-23-004-feat-shared-admin-ops-openapi-registry-plan.md, on the brainstorm/shared-admin-ops worktree) grew from "MCP for admin ops" into a 20-unit / 12-16 week OpenAPI-registry + agent-as-user-auth + REST-role-gate-sweep program. That plan is parked; this PR is the proportional delivery.

Test plan

  • pnpm -r typecheck — all workspace packages clean
  • pnpm -r test — 1240+ tests pass, including:
    • 6 new tests in packages/admin-ops/src/client.test.ts (auth headers, URL normalization, AdminOpsError shape, withTenant scoping)
    • 5 new tests in packages/admin-ops/src/tenants.test.ts (list/get/getBySlug/update wire format + 404 surfacing)
    • 10 new tests in packages/lambda/admin-ops-mcp.test.ts (JSON-RPC dispatch, auth enforcement, batch + notification paths, downstream REST failure surfacing as isError content, tenants_update at-least-one-field rule)
  • pnpm -r lint — clean (most packages are no-ops; apps/cli lint is a stub)
  • pnpm build:lambdas admin-ops-mcp — verify esbuild bundles cleanly (run before deploy)
  • Deploy to dev stage; verify POST /mcp/admin with initialize + tools/list + tools/call name=tenants_list round-trips via the Strands container
  • Dogfood: cd apps/cli && pnpm dev -- tenant list -s dev returns the tenant table; --json returns a machine-readable array

Follow-ups (separate PRs)

  • Add the rest of the admin surface to @thinkwork/admin-ops (teams, agents, templates, users, invites) — each is a new module plus a new MCP tool registration plus optional CLI command wiring.
  • Once enough ops are in packages/admin-ops, decide whether to deprecate packages/skill-catalog/thinkwork-admin/ in favor of agents loading the MCP server directly.

🤖 Generated with Claude Code

…igration

Introduces @thinkwork/admin-ops, a typed-function package over the existing
REST API at /api/tenants, consumed by:

- apps/cli — `thinkwork tenant list` and `thinkwork tenant get` migrate off
  notYetImplemented stubs and onto createClient + tenants.* ops. Other tenant
  subcommands stay stubbed.
- packages/lambda/admin-ops-mcp.ts — new Lambda handler that speaks
  stateless JSON-RPC MCP (2024-11-05 protocol) over POST /mcp/admin. Tools
  registered: tenants_list, tenants_get, tenants_update. Auth reuses the
  existing Bearer API_AUTH_SECRET the rest of the REST surface accepts.

Adding a new op lands in one module inside @thinkwork/admin-ops; the CLI
command and the MCP tool wrapper each pick it up in a 5-10 line change.
This replaces the earlier 12-16-week OpenAPI-pivot plan with a pragmatic
two-consumer-one-package shape sized at the original "MCP wrapper for the
admin commands" ask.

Includes:
- 11 unit tests in packages/admin-ops (client auth shape, URL encoding,
  error envelopes).
- 10 unit tests in packages/lambda/admin-ops-mcp.test.ts (JSON-RPC dispatch,
  auth enforcement, downstream error surfacing, batch + notification paths).
- Terraform route wiring in terraform/modules/app/lambda-api/handlers.tf.
- Build-script entry in scripts/build-lambdas.sh.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@ericodom ericodom merged commit 3ba3972 into main Apr 23, 2026
4 checks passed
@ericodom ericodom deleted the feat/admin-ops-mcp branch April 23, 2026 23:21
ericodom added a commit that referenced this pull request Apr 24, 2026
…#482)

Replaces the shared API_AUTH_SECRET gate on the admin-ops MCP Lambda
(POST /mcp/admin, introduced in #480) with per-tenant Bearer tokens
stored as SHA-256 hashes. Each incoming token is looked up in
tenant_mcp_admin_keys; a match pins tenantId on the downstream
@thinkwork/admin-ops client so any caller-supplied tenantId is
overridden. API_AUTH_SECRET is retained as a break-glass superuser
path for bootstrap/debug, with a log on every fallback.

Database
- New hand-rolled migration drizzle/0024_tenant_mcp_admin_keys.sql
  with the standard manual-migration header + -- creates: markers.
  Partial unique index on (tenant_id, name) WHERE revoked_at IS NULL
  lets operators recreate a revoked "default" name.
- Drizzle schema at packages/database-pg/src/schema/mcp-admin-keys.ts.

REST (packages/api/src/handlers/mcp-admin-keys.ts)
- POST   /api/tenants/:tenantId/mcp-admin-keys — issue, returns raw
  token ONCE. Token format tkm_<32B base64url>; server stores only
  the hash.
- GET    /api/tenants/:tenantId/mcp-admin-keys — list metadata.
- DELETE /api/tenants/:tenantId/mcp-admin-keys/:keyId — soft-delete.
- Bootstrap auth via validateApiSecret (same as sandbox-quota-check
  etc.); Cognito-aware auth will land with the admin-SPA UI.

CLI (extends apps/cli/src/commands/mcp.ts)
- thinkwork mcp key create [-t tenant --name label]
- thinkwork mcp key list   [-t tenant --all]
- thinkwork mcp key revoke <id> [-t tenant]

Client (packages/admin-ops/src/admin-keys.ts)
- createAdminKey / listAdminKeys / revokeAdminKey exported as
  admin-ops/admin-keys; deliberately NOT registered as MCP tools
  (would be a trivial privilege escalation vector).

Lambda auth swap (packages/lambda/admin-ops-mcp.ts)
- async authenticate() hashes the Bearer + looks up in
  tenant_mcp_admin_keys; falls through to API_AUTH_SECRET on miss
  or DB error; returns AuthResult { tenantId, keyId, superuser }.
- buildTools(auth) pins auth.tenantId on every downstream REST call;
  superuser falls back to caller-supplied arg.tenantId.
- Best-effort last_used_at bump on success (async, never blocks auth).

Bug fix
- Moved packages/lambda/admin-ops-mcp.test.ts → __tests__/ so vitest
  actually collects it. The tests from #480 existed but were not
  running because the lambda vitest config only includes
  __tests__/**/*.test.ts.

Tests
- 5 new tests in packages/api/src/handlers/mcp-admin-keys.test.ts
  (token entropy + format, hash determinism, case-sensitivity, hash
  equivalence).
- 17 tests in packages/lambda/__tests__/admin-ops-mcp.test.ts — 10
  from #480 now actually run + 7 new (tenant-key match, superuser
  fallback, DB-outage fallthrough, non-superuser rejection on DB
  failure, pinned tenantId override, superuser tenantId passthrough,
  token-hash collision rejection).
- All monorepo test suites pass (1270+ tests).

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ericodom added a commit that referenced this pull request Apr 24, 2026
…ces the Python skill (#488)

Closes the deprecation arc that started with PR #480 (admin-ops package +
MCP Lambda) and #482 (per-tenant Bearer keys). Every op the Python skill
shipped is now callable via the admin-ops MCP server (#486 ported the
full set); #487 provisions tenants. The skill's scripts are redundant.

What this PR does

- Delete packages/skill-catalog/thinkwork-admin/ — the whole directory:
  SKILL.md, skill.yaml, scripts/, tests/. 4,256 lines removed.
- Delete packages/api/src/__tests__/thinkwork-admin-e2e-smoke.test.ts —
  exercised the createAgent resolver via the Python skill's flow;
  the resolver itself is still covered by agents-authz.test.ts +
  set-agent-skills-subset.test.ts.
- Trim packages/api/src/__tests__/never-exposed-tier.test.ts — drops
  the skill.yaml-regex catastrophic-op-exclusion block (the skill is
  gone). The `requireNotFromAdminSkill` contract tests stay — that
  guard applies to every non-Cognito path (peer skills + agent broker
  + future integrations), not just the retired skill.
- Add docs/solutions/patterns/retire-thinkwork-admin-skill-2026-04-24.md
  — full runbook including pre-merge SQL for disabling any
  agent_skills rows that still reference the skill.

What this PR keeps

- All defensive primitives (requireNotFromAdminSkill,
  requireAdminOrApiKeyCaller, requireAgentAllowsOperation,
  adminRoleCheck query) — useful for peer skills + future broker
  work, not skill-specific.
- Historical migrations (drizzle/0020, drizzle/0022) — they were
  applied to prod; the file artifacts stay for audit.
- Resolver comments mentioning the skill as historical context —
  accurate descriptions of why a guard exists.

Operator prereq (documented in the runbook)

Before merging, run in each stage:

  thinkwork mcp provision --all -s <stage>

Then apply the retire SQL from the runbook to disable any lingering
agent_skills rows. The runtime degrades gracefully if it encounters
the deleted skill (skill_runner logs and skips), but the SQL makes
the deprecation explicit + auditable.

Tests
- 1057 api tests, 65 lambda, 124 cli, 17 admin-ops, etc. — all green.
- Terraform validate passes.
- Python key files (server.py, skill_runner.py, the two test files
  that mention 'thinkwork-admin' in docstrings) parse clean.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ericodom added a commit that referenced this pull request May 5, 2026
…igration (#480)

Introduces @thinkwork/admin-ops, a typed-function package over the existing
REST API at /api/tenants, consumed by:

- apps/cli — `thinkwork tenant list` and `thinkwork tenant get` migrate off
  notYetImplemented stubs and onto createClient + tenants.* ops. Other tenant
  subcommands stay stubbed.
- packages/lambda/admin-ops-mcp.ts — new Lambda handler that speaks
  stateless JSON-RPC MCP (2024-11-05 protocol) over POST /mcp/admin. Tools
  registered: tenants_list, tenants_get, tenants_update. Auth reuses the
  existing Bearer API_AUTH_SECRET the rest of the REST surface accepts.

Adding a new op lands in one module inside @thinkwork/admin-ops; the CLI
command and the MCP tool wrapper each pick it up in a 5-10 line change.
This replaces the earlier 12-16-week OpenAPI-pivot plan with a pragmatic
two-consumer-one-package shape sized at the original "MCP wrapper for the
admin commands" ask.

Includes:
- 11 unit tests in packages/admin-ops (client auth shape, URL encoding,
  error envelopes).
- 10 unit tests in packages/lambda/admin-ops-mcp.test.ts (JSON-RPC dispatch,
  auth enforcement, downstream error surfacing, batch + notification paths).
- Terraform route wiring in terraform/modules/app/lambda-api/handlers.tf.
- Build-script entry in scripts/build-lambdas.sh.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ericodom added a commit that referenced this pull request May 5, 2026
…#482)

Replaces the shared API_AUTH_SECRET gate on the admin-ops MCP Lambda
(POST /mcp/admin, introduced in #480) with per-tenant Bearer tokens
stored as SHA-256 hashes. Each incoming token is looked up in
tenant_mcp_admin_keys; a match pins tenantId on the downstream
@thinkwork/admin-ops client so any caller-supplied tenantId is
overridden. API_AUTH_SECRET is retained as a break-glass superuser
path for bootstrap/debug, with a log on every fallback.

Database
- New hand-rolled migration drizzle/0024_tenant_mcp_admin_keys.sql
  with the standard manual-migration header + -- creates: markers.
  Partial unique index on (tenant_id, name) WHERE revoked_at IS NULL
  lets operators recreate a revoked "default" name.
- Drizzle schema at packages/database-pg/src/schema/mcp-admin-keys.ts.

REST (packages/api/src/handlers/mcp-admin-keys.ts)
- POST   /api/tenants/:tenantId/mcp-admin-keys — issue, returns raw
  token ONCE. Token format tkm_<32B base64url>; server stores only
  the hash.
- GET    /api/tenants/:tenantId/mcp-admin-keys — list metadata.
- DELETE /api/tenants/:tenantId/mcp-admin-keys/:keyId — soft-delete.
- Bootstrap auth via validateApiSecret (same as sandbox-quota-check
  etc.); Cognito-aware auth will land with the admin-SPA UI.

CLI (extends apps/cli/src/commands/mcp.ts)
- thinkwork mcp key create [-t tenant --name label]
- thinkwork mcp key list   [-t tenant --all]
- thinkwork mcp key revoke <id> [-t tenant]

Client (packages/admin-ops/src/admin-keys.ts)
- createAdminKey / listAdminKeys / revokeAdminKey exported as
  admin-ops/admin-keys; deliberately NOT registered as MCP tools
  (would be a trivial privilege escalation vector).

Lambda auth swap (packages/lambda/admin-ops-mcp.ts)
- async authenticate() hashes the Bearer + looks up in
  tenant_mcp_admin_keys; falls through to API_AUTH_SECRET on miss
  or DB error; returns AuthResult { tenantId, keyId, superuser }.
- buildTools(auth) pins auth.tenantId on every downstream REST call;
  superuser falls back to caller-supplied arg.tenantId.
- Best-effort last_used_at bump on success (async, never blocks auth).

Bug fix
- Moved packages/lambda/admin-ops-mcp.test.ts → __tests__/ so vitest
  actually collects it. The tests from #480 existed but were not
  running because the lambda vitest config only includes
  __tests__/**/*.test.ts.

Tests
- 5 new tests in packages/api/src/handlers/mcp-admin-keys.test.ts
  (token entropy + format, hash determinism, case-sensitivity, hash
  equivalence).
- 17 tests in packages/lambda/__tests__/admin-ops-mcp.test.ts — 10
  from #480 now actually run + 7 new (tenant-key match, superuser
  fallback, DB-outage fallthrough, non-superuser rejection on DB
  failure, pinned tenantId override, superuser tenantId passthrough,
  token-hash collision rejection).
- All monorepo test suites pass (1270+ tests).

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ericodom added a commit that referenced this pull request May 5, 2026
…ces the Python skill (#488)

Closes the deprecation arc that started with PR #480 (admin-ops package +
MCP Lambda) and #482 (per-tenant Bearer keys). Every op the Python skill
shipped is now callable via the admin-ops MCP server (#486 ported the
full set); #487 provisions tenants. The skill's scripts are redundant.

What this PR does

- Delete packages/skill-catalog/thinkwork-admin/ — the whole directory:
  SKILL.md, skill.yaml, scripts/, tests/. 4,256 lines removed.
- Delete packages/api/src/__tests__/thinkwork-admin-e2e-smoke.test.ts —
  exercised the createAgent resolver via the Python skill's flow;
  the resolver itself is still covered by agents-authz.test.ts +
  set-agent-skills-subset.test.ts.
- Trim packages/api/src/__tests__/never-exposed-tier.test.ts — drops
  the skill.yaml-regex catastrophic-op-exclusion block (the skill is
  gone). The `requireNotFromAdminSkill` contract tests stay — that
  guard applies to every non-Cognito path (peer skills + agent broker
  + future integrations), not just the retired skill.
- Add docs/solutions/patterns/retire-thinkwork-admin-skill-2026-04-24.md
  — full runbook including pre-merge SQL for disabling any
  agent_skills rows that still reference the skill.

What this PR keeps

- All defensive primitives (requireNotFromAdminSkill,
  requireAdminOrApiKeyCaller, requireAgentAllowsOperation,
  adminRoleCheck query) — useful for peer skills + future broker
  work, not skill-specific.
- Historical migrations (drizzle/0020, drizzle/0022) — they were
  applied to prod; the file artifacts stay for audit.
- Resolver comments mentioning the skill as historical context —
  accurate descriptions of why a guard exists.

Operator prereq (documented in the runbook)

Before merging, run in each stage:

  thinkwork mcp provision --all -s <stage>

Then apply the retire SQL from the runbook to disable any lingering
agent_skills rows. The runtime degrades gracefully if it encounters
the deleted skill (skill_runner logs and skips), but the SQL makes
the deprecation explicit + auditable.

Tests
- 1057 api tests, 65 lambda, 124 cli, 17 admin-ops, etc. — all green.
- Terraform validate passes.
- Python key files (server.py, skill_runner.py, the two test files
  that mention 'thinkwork-admin' in docstrings) parse clean.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant