Skip to content

fix: recommend sub-agent dispatch for large-scope lifting (runtime-agnostic)#87

Merged
userFRM merged 19 commits intomainfrom
fix/subagent-guidance
Apr 14, 2026
Merged

fix: recommend sub-agent dispatch for large-scope lifting (runtime-agnostic)#87
userFRM merged 19 commits intomainfrom
fix/subagent-guidance

Conversation

@userFRM
Copy link
Copy Markdown
Owner

@userFRM userFRM commented Apr 14, 2026

Problem

The server's `lifting_status` NEXT STEP told the agent to call `get_entities_for_lifting(scope="*")` directly, regardless of scope size. Each batch returns ~12K tokens of source code. On a 1500-entity repo (~150 batches), an agent following this guidance burns ~1.8M tokens of its own context before it can do anything else — and frequently exhausts context partway through.

Fix

Size-aware NEXT STEP in `lifting_status`

  • Under 100 entities: lift directly (unchanged).
  • 100+ entities: NEXT STEP describes the delegation pattern (dispatch a sub-agent / cheaper model) with a Claude Code `Task()` + Haiku example and explicit callouts for Gemini CLI, Codex, Cursor, opencode, Windsurf. Falls back to CLI autonomous lifting (`rpg-encoder lift --provider anthropic`) when no dispatch mechanism exists.

Dispatch note in `get_entities_for_lifting`

Emitted on batch 0 when ≥10 batches are queued, in case the caller skipped `lifting_status` and jumped straight to pulling batches.

Runtime-agnostic phrasing

The server doesn't know what's connected. Guidance describes the pattern, names Claude Code's syntax as one example, and points Gemini/Codex/Cursor/opencode/Windsurf users at their own equivalents.

`server_instructions.md`

Large-scope guidance simplified from "parallel subagents per area" to "one sub-agent drains it" — same result, simpler caller reasoning.

Test plan

  • `cargo fmt --all -- --check` clean
  • `cargo clippy --workspace --all-targets -- -D warnings` clean
  • All 651 workspace tests pass

🤖 Generated with Claude Code

userFRM and others added 19 commits April 14, 2026 12:09
Claude Code (and other MCP clients) were processing 1500-entity lifting
jobs directly in the foreground because the server's NEXT STEP guidance
said "Call get_entities_for_lifting(scope='*') to start lifting" with no
awareness of scale. Each batch returns ~12K tokens of source — 150
batches burns ~1.8M tokens before the agent can do anything else.

The guidance now branches on remaining count:
- <100 entities: lift directly (same as before).
- >=100 entities: NEXT STEP describes the delegation pattern and gives
  a Claude Code Task() example, while calling out that Gemini CLI,
  Codex, Cursor, opencode, Windsurf etc. have their own equivalents.

get_entities_for_lifting also emits a dispatch note on batch 0 when
>=10 batches are queued, in case the caller skipped lifting_status.

Runtime-agnostic phrasing throughout — the server doesn't know if the
connected agent is Claude, Gemini, GPT, or anything else. It describes
the pattern ("delegate to a sub-agent or cheaper model") and provides
Claude's Task()+Haiku syntax as one example among several runtimes.

server_instructions.md large-scope guidance also simplified from
"parallel subagents per area" to "one sub-agent drains it" — same
result, simpler caller reasoning.

Bumps to v0.8.3. 651 tests pass.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Codex review of the original PR #87 flagged six issues. All addressed:

HIGH — NEXT STEP must remain a single parseable line. Restored. Detail
lives in labeled blocks below (LOOP:, DISPATCH:, FALLBACK:, FALLBACK:)
so regex-based consumers keep working.

MEDIUM — "~12K tokens per batch" was stale. Token budget is now read
from self.config.encoding.max_batch_tokens (default 8000) at call time,
and estimates scale correctly when the user overrides batch_tokens.

MEDIUM — runtime-agnostic claim was overstated. Dropped Claude Code's
Task() syntax from the core response. Callers use whatever sub-agent
mechanism their runtime provides. Nothing in the response anchors on
Anthropic/Haiku.

MEDIUM — no-key fallback was a dead end. Now spells out two concrete
fallbacks: scoped lifting (get_entities_for_lifting with a file glob)
for callers with nothing at all, and CLI lift (rpg-encoder lift
--provider ...) for callers with an API key.

LOW — handshake prompt grew. server_instructions.md Large-scope
section rewritten and trimmed from +105 words to +22 words over the
pre-PR baseline.

LOW — README didn't mention delegation. Added a short paragraph
explaining the 100+ entity path and the CLI autonomous lift.

NIT — magic numbers 100 and 10 were duplicated across server.rs and
tools.rs. Moved to crate-level LARGE_SCOPE_ENTITIES and
LARGE_SCOPE_BATCHES constants in main.rs.

All 651 workspace tests still pass. fmt/clippy clean.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…face

Agents consistently reached for grep/cat/Read on codebase questions even
when RPG was built, lifted, and fully indexed — because the server's prompt
and tool descriptions never actively countered the training default. Users
had to manually remind every session "why don't you use the RPG?".

Fix is a consistent directive across every surface the agent reads:

server_instructions.md:
  New top section "USE RPG FIRST — BEFORE grep / cat / Read / find" with
  a concrete mapping table (12 rows). Placed before LIFTING FLOW so the
  directive is read before any workflow detail.

Tool descriptions (12 tools):
  Each of search_node, fetch_node, explore_rpg, rpg_info, semantic_snapshot,
  context_pack, impact_radius, plan_change, analyze_health, detect_cycles,
  find_paths, slice_between now opens with a "PREFER THIS OVER ..." marker
  naming the shell command or workflow it replaces. Tool descriptions are
  what agents weigh when choosing which tool to invoke — making the
  displacement explicit at that layer is what actually sticks.

.claude/skills/rpg/SKILL.md:
  Same mapping table scoped to the CLI surface. Explicit hint to prefer
  MCP tools when both are available.

README:
  Dedicated "Use RPG before grep/cat/find" section placed above "How It
  Works". Same mapping as the server prompt so human readers see the
  same positioning the agent does.

No tests affected. All 651 pass. fmt/clippy clean.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…hrasing

Opus audit of PR #87 found one HIGH issue and several LOW polish items.
All addressed:

H1 — "~44% fewer tokens" claim was unsubstantiated (no benchmark, no data).
I had escalated a long-buried server-prompt phrase into the context_pack
tool description and the new mapping tables. Removed across all surfaces:
  - tools.rs context_pack description
  - server_instructions.md mapping table row
  - server_instructions.md token-saving tips block
This is the exact failure mode MEMORY.md's "NEVER ship perf code without
benchmarks" rule prohibits — caught only because it was being shipped to
every runtime via the description field.

M1 — CHANGELOG claim "≤30 tokens prompt growth" was misleading.
The figure described only the LIFTING FLOW sub-section; the new
"USE RPG FIRST" section adds ~500 tokens. Reworded to scope the ≤30
to the right sub-section and acknowledge the deliberate +500 from
the new top section.

M2 — "abort this call" wording was misleading because the batch source
payload has already been delivered by the time the agent reads the NOTE.
Changed to "stop here — do not request batches 2..N" so the instruction
matches what's actually possible.

L1 — .gemini/extensions/rpg/CONTEXT.md got the same "use RPG first /
fall back to grep only for literal text" guidance. Three other surfaces
already had it; Gemini was the gap.

L3 — Capitalized "Read" in runtime-neutral surfaces (server_instructions.md,
README, tools.rs fetch_node) pattern-matched to Claude Code's Read tool.
Replaced with "file reads" / "reading a function" / "cat" so non-Claude
runtimes don't infer a tool name. Capital-R left intact in SKILL.md
(correctly Claude-scoped).

Skipped: M3 (test coverage for new dispatch branches — pre-existing
gap, follow-up), L2 (rpg_info ↔ tree mapping is a slight stretch but
defensible), L4 (reconstruct_plan/update_rpg correctly omitted), N1-N3
(stylistic).

651 tests still pass. fmt/clippy clean.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Codex round 2 review of PR #87 found 4 issues. All addressed:

HIGH — CLI fallback (`rpg-encoder lift --provider ...`) writes to disk,
but the MCP server keeps serving the in-memory graph until reload_rpg
is called. Documented across all 4 surfaces (server.rs lifting_status
block, server_instructions.md, README, .gemini/commands/rpg-lift.toml):
after the CLI finishes, call reload_rpg in this session.

MEDIUM — set_project_root swapped the root and reloaded the graph but
left self.config pointing at the previous project's config. After
switching projects, the server would serve the prior project's batch
size and token budget. Now reloads RpgConfig atomically with the swap.

MEDIUM — lifting_status checked `remaining >= 100` against the raw
pre-auto-lift count. Auto-lift runs inside get_entities_for_lifting and
can shrink the LLM-needed set dramatically for repos with many trivial
entities. The dashboard could promise delegation for work the auto-lift
would finish in zero calls. Reworded the recommendation as conditional
("likely-large workload — call get_entities_for_lifting next; if its
batch-0 response includes the delegation NOTE, follow the dispatch
pattern"). The batch-0 NOTE sees the post-auto-lift queue and is
authoritative.

LOW — Server instructions said `rpg_info` returns "No RPG found", but
that's a tool error from ensure_graph, not a friendly status string.
Changed to "any RPG tool returns 'No RPG found'".

651 tests pass. fmt/clippy clean.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…eshes config

Codex round 3 review of PR #87 found 3 more issues. All addressed:

HIGH — The dispatch happy path assumes the worker's tool calls update
the same in-memory graph the caller reads. That's true when the runtime
shares the MCP session (e.g., Claude Code Task), but not guaranteed —
some runtimes give sub-agents isolated MCP sessions, in which case the
worker writes to disk via submit_lift_results but the parent keeps
serving a stale in-memory graph. Documented across server.rs and
server_instructions.md: "after the worker returns, call reload_rpg".
No-op when sessions are shared, required when isolated.

MEDIUM — reload_rpg only reloaded the graph, not the config. Parallel
to the set_project_root fix from the prior round. If .rpg/config.toml
was edited externally (or rewritten by the lifter), batch_size and
max_batch_tokens stayed cached at the old values. reload_rpg now also
reloads RpgConfig from disk. Tool description updated to reflect the
broader scope.

LOW — "any RPG tool returns 'No RPG found'" overstated the consistency
of the error string. Different code paths return slightly different
messages. Reworded to "RPG tools error with messages like 'No RPG
found' or 'graph: not built'" — describes the pattern rather than
promising a specific string.

651 tests pass. fmt/clippy clean.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…-lift

Real-world session showed an agent committing new code repeatedly without
re-lifting, even though the auto-sync notice reported "N need lifting"
after each commit. The agent treated the count as informational. The
user had to manually intervene with "did you lift?" to trigger a Haiku
re-lift.

Three changes to make drift actionable:

1. Auto-sync notice is now active, not informational. Examples:
     [auto-synced: +5 -0 ~3 entities; 5 new entities unlifted —
      semantic search is now incomplete; call lifting_status to refresh]
     [auto-synced: +120 -0 ~0 entities; 120 new entities unlifted —
      semantic search is now incomplete; call lifting_status for re-lift dispatch]
     [auto-synced: +2 -0 ~10 entities; 2 new + 10 stale features —
      semantic search is now incomplete; call lifting_status to refresh]
   New entities and stale features are reported separately so the agent
   sees the difference between "you added code" and "you modified code".

2. New "DRIFT MAINTENANCE" section in server_instructions.md explains the
   notice variants and frames re-lift as part of "definition of done" for
   any task that wrote code — the same way tests are.

3. submit_lift_results NEXT action is now scale-aware. When remaining
   count after a batch ≥ LARGE_SCOPE_ENTITIES, it points back at
   lifting_status for the dispatch pattern rather than encouraging
   another foreground batch.

651 tests pass. fmt/clippy clean.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… threshold docs

HIGH — Auto-sync notice mislabelled the unlifted count. I used
(total - lifted), which is the global backlog, but rendered it as
"N new entities unlifted". A small edit on a partially-lifted repo
could claim 50 new entities when only 1 was actually new. Now reports
per-update delta separately ("+N added unlifted, ~M stale") and notes
any pre-existing backlog as "(+P pre-existing)".

HIGH — finalize_lifting guidance in the no-dispatch FALLBACK was wrong.
It said to call finalize_lifting after each scoped subtree, but
finalize_lifting auto-routes pending entities and locks the hierarchy
in. Calling it mid-flow against incomplete signals would bake in bad
routing. Fixed in both server.rs and server_instructions.md to "call
finalize_lifting ONCE at the very end".

MEDIUM — LARGE_SCOPE_ENTITIES (lifting_status heuristic) and
LARGE_SCOPE_BATCHES (get_entities_for_lifting batch-0 authority) can
diverge under tuned config. Documented explicitly: dashboard is a
heuristic gate, batch-0 NOTE is authoritative; they defer to each
other in messaging when they disagree.

651 tests pass. fmt/clippy clean.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…tatus

Five findings, all threading off the same omission: the server *knew* which
entities had been modified since last lift (via `summary.modified_entity_ids`
on each `auto_sync_if_stale`) but never persisted that knowledge anywhere
visible to the lifting state machine. So `lifting_status` reported "100%
coverage" the moment the last ever-unlifted entity got features, even if
half the repo had been modified since; and `get_entities_for_lifting(*)`
would return zero entities to a caller trying to re-lift the staleness it
had just been told about.

Changes:

- New persistent `stale_entity_ids: Arc<RwLock<HashSet<String>>>` on
  RpgServer. Populated from `summary.modified_entity_ids` after each
  auto-sync, drained by `submit_lift_results` as entities get re-lifted,
  reset on `reload_rpg` and `set_project_root`.
- `lifting_status` header now shows `stale_features: N entities modified
  since last lift` when the set is non-empty. NEXT STEP state machine
  compares `remaining + stale_features` against `LARGE_SCOPE_ENTITIES` and
  has dedicated branches for "unlifted + stale mixed" and "stale only".
- `get_entities_for_lifting(scope="*")` augments its resolved scope with
  tracked stale entities (resolve_scope filters to `features.is_empty()`
  which would otherwise skip them). The auto-lift skip check now treats
  stale entities as if they had no features so they flow into `needs_llm`
  and come back through the normal LLM loop.
- Large-scope NEXT STEP no longer bounces the caller through
  `get_entities_for_lifting` before seeing the dispatch recommendation.
  LOOP / DISPATCH / FALLBACK blocks are emitted directly by `lifting_status`,
  so the caller delegates without first loading a batch payload into its
  own context.
- `set_project_root` and `reload_rpg` now use a `reload_config_with_warning`
  helper that distinguishes "no .rpg/config.toml" (silent default) from
  "present but malformed" (stderr warning, keeps previous in-memory config).

Verified with `cargo fmt --all`, `cargo clippy --workspace --all-targets
-- -D warnings`, `cargo test --workspace --lib`, and the rpg-mcp
integration tests — all green.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Follow-up to round 5 stale-tracking work. The auto-lift path inside
get_entities_for_lifting writes features directly to the graph and
persists them there — it never round-trips through submit_lift_results,
so the drain logic added in round 5 never fired for auto-lifted stale
entities. Effect: a stale entity that matched a high-confidence auto-lift
pattern would get correctly re-lifted (fresh features on disk) but stay
pinned in stale_entity_ids forever, inflating the "stale_features" count
in lifting_status.

Fix: track auto-relifted stale IDs in a local vec as each raw entity is
processed, and drain them from stale_entity_ids right after the graph
save. Mirrors the drain that submit_lift_results already does for the
LLM path.

Also canonicalizes a small pre-existing inconsistency: `raw.id()` was
being called twice (once for the skip check, once inside match arms) —
hoisted to `raw_id` to match the new stale lookup and save a clone.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Code fixes:

- update_rpg now feeds summary.modified_entity_ids into stale_entity_ids
  so its "needs_relift: N" reply aligns with what lifting_status and
  get_entities_for_lifting(scope="*") report. Previously those values
  were reported but the set was never populated from the manual update
  path — only auto_sync_if_stale touched it.

- submit_lift_results NEXT/DONE logic now counts unlifted + stale
  remaining. A stale-only re-lift loop (coverage already 100%) would
  otherwise see "DONE" after batch 1 while later batches were still
  queued.

- reload_rpg now clears stale_entity_ids only on the success path after
  storage::load returns Ok. A transient read error no longer erases the
  drift backlog while leaving the previous graph in memory.

- CLI fallback guidance ("rpg-encoder lift --provider ...") is gated to
  cases with actual unlifted entities. The CLI's resolve_scope filters
  to entities with no features, so a stale-only backlog is a no-op for
  the CLI — surfacing it there was a dead-end recipe. Added a note
  explaining the limitation in server.rs, server_instructions.md, and
  README.

CHANGELOG:

- Collapsed v0.8.3 into the three standard Keep a Changelog buckets
  (Added, Changed, Fixed). Dropped subjective/process-revealing
  subheadings like "Fixed (Codex round 5 review)" and "Tool-preference
  guidance (RPG first, grep second)" — those expose the internal
  review cycle rather than describing what users see.

Verified with cargo fmt --all, cargo clippy --workspace --all-targets
-- -D warnings, cargo test --workspace --lib, cargo test -p rpg-mcp.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…eamble

build_rpg: The tail of the response is now a directive NEXT STEP — "lift
now, don't wait for user to ask", with scale-aware branching: small
scope lifts inline, large scope dispatches a sub-agent with the LOOP
pattern inline. Previously the response ended with a passive "Tip: use
get_entities_for_lifting" line that agents routinely skipped, which is
why the common flow in practice was "user builds RPG, asks a semantic
question, gets poor results, manually asks to lift".

Lock order: Added a struct-level doc comment on `RpgServer` declaring
the canonical lock order (graph → session → stale → pending →
auto-sync → config → embedding → project_root). Every nested-lock call
site already respects this order — documenting it here so future
reviewers (and Codex) don't have to re-derive it across a dozen files.
Paths that acquire inner locks one at a time with release-between (the
statement-per-lock pattern in set_project_root and reload_rpg) don't
hold two locks simultaneously and so cannot form a cycle regardless of
order.

Cache-friendly preamble: `get_routing_candidates` no longer prints the
graph revision hash in its header. Putting a content-hash on line 1 of
the response means the cache fingerprint changes on every graph update,
invalidating the stable instructions + entity table that follow. The
revision moved to the NEXT_ACTION block at the bottom where it's read
back for the submit_routing_decisions call — same data, stable prefix.

Verified with cargo fmt --all, cargo clippy --workspace --all-targets
-- -D warnings, cargo test -p rpg-mcp.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ch in build_rpg

Two issues caught by Codex:

Deadlock cycle (HIGH): build_semantic_hierarchy's sharded init path took
hierarchy_session.write() first, then acquired graph.read() to compute
clusters. update_rpg takes graph.write() first, then
hierarchy_session.write(). Concurrent schedule: A holds session, waits
on graph; B holds graph, waits on session. Real cycle, contradicted the
lock-order invariant I had just declared on RpgServer.

Fix: compute clusters under graph.read() first (no session held), then
acquire hierarchy_session.write() and install — with a
racing-initializer check so a second concurrent caller keeps the first
caller's clusters instead of clobbering them.

Module-count mismatch (MEDIUM): build_rpg's NEXT STEP drove its
"unlifted of total" count and LARGE_SCOPE delegation threshold from
meta.total_entities / meta.lifted_entities, both of which include Module
entities. But lifting_coverage() and get_entities_for_lifting exclude
modules (they get features via file-level synthesis, not direct
lifting). On a codebase with many Module entities, build_rpg would
report ~100 more "unlifted" than lifting_status, and could recommend
sub-agent dispatch when lifting_status would still recommend foreground
lifting for the same graph.

Fix: capture lifting_coverage() before the graph moves into self.graph,
use it for both the "lifted: X/Y" header line and the NEXT STEP count.
Added a note "(excludes modules, ...)" so the number difference between
the entities count and lifted/total is explicit.

Verified with cargo fmt, cargo clippy --workspace --all-targets -- -D
warnings, cargo test --workspace --lib, cargo test -p rpg-mcp.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…d" header

TOCTOU: build_semantic_hierarchy's sharded init computed clusters under
graph.read(), dropped that lock, then acquired hierarchy_session.write().
In the gap, build_rpg/update_rpg could swap the graph and clear the
session; this path would then install clusters derived from the
pre-update graph. Fix: hold graph.read() through the hierarchy_session
install. Graph-before-session matches the declared lock order, so no
new cycle.

Header rename: "lifted: X/Y (excludes modules...)" reads as though "X/Y"
undercounts, since "entities: N" above uses the module-inclusive total.
Renamed to "liftable_entities: X/Y (modules are aggregated from files,
not lifted directly)" per Codex's suggestion — keeps the non-module
semantics explicit and removes the mental diff against the entities
line.

Verified with cargo fmt, cargo clippy -p rpg-mcp --all-targets -- -D
warnings, cargo test -p rpg-mcp.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ation

Two follow-ups from Codex's round 8:

Session-clear race (MEDIUM): build_batch_0_domain_discovery re-read
self.hierarchy_session after the caller had already decided to emit
batch 0. In the gap between install/drop and the re-read, a concurrent
build_rpg or update_rpg can clear the session, and the re-read panics
on unwrap(). Refactored the helper to take clusters as a parameter —
callers snapshot the cluster list while holding the session lock and
hand it off, so the helper never re-reads session state it doesn't
own.

Project-switch config contamination (LOW): set_project_root was calling
reload_config_with_warning, whose "keep previous config on parse
failure" semantics are correct for same-project reload but wrong for
project switch. A project whose .rpg/config.toml was malformed would
silently inherit the previous project's encoding/batch settings. Added
a dedicated project-switch load path that falls back to
RpgConfig::default() on parse failure instead of preserving the old
project's config.

Verified with cargo fmt, cargo clippy -p rpg-mcp --all-targets -- -D
warnings, cargo test -p rpg-mcp.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…quisition

Final session-clear race closure. The previous fix sampled
hierarchy_session with a read lock, branched on "is it initialized?",
and then trusted that answer after a later write-lock acquisition.
Between the two, a concurrent build_rpg/update_rpg/reload_rpg/
set_project_root could clear the session, and the subsequent
session_guard.as_mut().unwrap() on the "already initialized" branch
would panic.

Fix: collapse both branches into a single session.write() acquisition
under graph.read() (still graph-before-session per the declared
invariant). Decide init-vs-continue while holding the write lock; pack
the work into an Action enum; drop the locks; then render. The session
cannot change between the decision and the render because the
decision-time snapshot carries everything the render needs.

Verified with cargo fmt, cargo clippy -p rpg-mcp --all-targets -- -D
warnings, cargo test -p rpg-mcp.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two more findings from Codex:

Graph-replace race in hierarchy helpers (HIGH): build_batch_0_domain_discovery
and build_cluster_batch dropped graph_guard before rendering and then
re-read self.graph, unwrapping the option. A concurrent set_project_root
to a root with no graph could panic; any graph replacement could render
against the wrong snapshot. Refactored both helpers to take `&RPGraph`
as a parameter; the caller keeps graph_guard alive across the render so
the helper's reference stays valid and there's no second read.

reload_rpg wiped stale-feature drift (MEDIUM): the documented CLI /
isolated-subagent flow only re-lifts entities with no features — stale
entities (features present but outdated) survive. Clearing
stale_entity_ids on every successful reload erased that backlog and let
lifting_status falsely report 100% coverage. Now we prune by entity
existence in the newly-loaded graph instead of wholesale clearing.

Verified with cargo fmt, cargo clippy -p rpg-mcp --all-targets -- -D
warnings, cargo test -p rpg-mcp.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… + 4 polish)

Two MEDIUM bugs found by an independent reviewer that survived all prior
review rounds:

Startup stale-tracking loss (MEDIUM, main.rs): the startup auto-update
discarded summary.modified_entity_ids. Modifications between the last
lift and a session restart silently dropped off lifting_status, even
though the same modifications would have been tracked correctly if they
happened mid-session via auto_sync_if_stale or update_rpg. Now the
startup path feeds the IDs into stale_entity_ids and prunes against the
current graph. Also seeds last_auto_sync_changeset with the real empty-
workdir hash so the first tool call short-circuits cleanly.

auto_lift stale leak (MEDIUM, tools.rs): when invoked with a non-`*`
scope, the lift pipeline freshens features for every in-scope entity
including ones that were previously stale. The default `*` scope is
safe because resolve_scope filters to feature-empty entities, but
explicit scopes bypass that filter. Without a drain, lifting_status
would keep reporting those entities as stale forever. Now we snapshot
features before the pipeline, diff after, and drain stale entries for
any ID whose features changed.

Plus four LOW polish items from the same review:

- set_project_root tool description was Claude-Code-specific in its
  example. Reads runtime-neutral now.
- get_entities_for_lifting batch-0 NOTE referenced "batches 2..N",
  off-by-one against the 0-based batch_index parameter it accepts.
  Reads "do not request further batches in this context".
- auto_sync_if_stale reordered inner writes to match the declared lock
  rank (stale=3 before auto-sync markers=5). Functionally equivalent
  (each write is statement-per-lock) but matches the invariant doc as
  a clean exemplar.
- build_rpg now prunes stale_entity_ids against the new graph so dead
  IDs don't accumulate across rebuilds.

Verified with cargo fmt, cargo clippy -p rpg-mcp --all-targets -- -D
warnings, cargo test -p rpg-mcp.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- auto_lift drain now pulls the in-scope entity IDs from resolve_scope
  and drains them unconditionally, instead of diffing pre/post features.
  The diff-gated approach missed the identical-features edge case (a
  cosmetic edit re-lifts to the same output and stale would persist).
- Misleading comment in auto_lift claiming "MCP stdio is serial — no
  concurrent requests" was factually wrong about rmcp 0.14.0 (which
  does tokio::spawn per request). Corrected to cite the actual
  synchronization primitives: the `lift_in_progress` atomic and the
  graph write lock held across the pipeline.
- Dropped the "clean exemplar" rationale from the auto_sync_if_stale
  lock-order comment. The ordering change still holds, but the claim
  that it exemplifies canonical order was inconsistent — `update_rpg`
  and `set_project_root` use different inner-lock orders and the
  doc's explicit escape hatch (statement-per-lock) covers all three.
- Rewrote CHANGELOG 0.8.3 buckets. The Added bucket had swallowed
  many items that are Fixed (prior bugs) or Changed (behavioral
  modifications to existing features). Re-sorted: new capabilities in
  Added, behavioral changes in Changed, bug fixes in Fixed. The
  Keep-a-Changelog semantics were violated by treating every new
  bullet as Added.

Verified with cargo fmt, cargo clippy -p rpg-mcp --all-targets -- -D
warnings, cargo test -p rpg-mcp.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@userFRM userFRM merged commit 26b24ee into main Apr 14, 2026
6 checks passed
@userFRM userFRM deleted the fix/subagent-guidance branch April 14, 2026 19:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant