Fix: async SessionBank commit for tool-call responses#17
Merged
youssofal merged 1 commit intoMay 7, 2026
Merged
Conversation
Fill in _schedule_idle_postcommit_snapshot, which was scaffolded as a no-op that just logged 'abandoned_foreground_busy' and returned without calling bank.put. As a result, every response routed through the async postcommit path - which in practice means every tool-using OpenAI- compatible request - never made it into the SessionBank, forcing full cold prefill on every turn even with a stable session id. The fix dispatches the existing _store_retokenized_history_snapshot inside the background executor once the foreground goes idle, bounded by a 30s deadline that preserves the 'do not extend stream latency' contract. Validated end-to-end against a multi-turn opencode subagent session: cached_tokens climbs from 0 to 22479 within a few turns and TTFT drops roughly in half. Bank entries appear with bytes > 0 in /admin/sessions. Tests: updated the stale 'reports_pending_without_foreground_work' test to the new contract, added regression tests for the tool-call commit path and the deadline-abandon path. Files: mtplx/server/openai.py, tests/test_openai_bridge.py, CHANGELOG.md
Owner
|
Good work |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fix: async SessionBank commit for tool-call responses
Summary
_schedule_idle_postcommit_snapshotwas scaffolded but never implemented. When the generation-final compatibility check rejects a response as unsafe to commit (typically because it containedtool_calls), control falls through to this function. The unpatched implementation just logs"abandoned_foreground_busy"and returns without ever callingbank.put. Result: any session whose responses include tool calls (every modern coding agent: opencode, Claude Code, Codex, Aider) never gets committed to the SessionBank, even when its session id is stable. Every turn pays full cold prefill.This patch fills in the function: poll for the foreground to clear (bounded by a 30s deadline), then run the existing
_store_retokenized_history_snapshotsynchronously inside the background executor. That function already canonicalises the conversation (includingassistant_tool_calls) into the exact prefix the next request will send, so the commit is byte-for-byte safe - it just had no caller.Reproduction (before this patch)
Run any tool-using OpenAI-compatible client against
mtplx --port 8088with a stablex-mtplx-session-idheader:# opencode example opencode run --model mtplx/qwen36-27b ...Watch
GET /admin/sessionsandGET /metrics:bytesstays at0last_cache_miss_reasonstays"new_session"boundaries[].bank_token_hashisnull,nbytesis0for every snapshotcached_tokensis0even after dozens of turnsRepro after this patch
Same workload, same monitoring:
bytes > 0after the first response stream completescached_tokensgrows monotonically withcontext_lenfrom turn 2 onwardEmpirical numbers from a live opencode session
Seven-turn
researchersubagent sequence on a 27B Qwen model:Compare to a same-length session under unpatched 0.1.6 (same hardware, same workload), which never showed
cached > 0and held TTFT at 80-100 s for context > 30K. The remainingcached=0rows after the fix (turns 2 and 4) are postcommit-lag artifacts: the next turn arrived before the async commit acquired the model lock. They self-heal once a quiet moment lands. The lag is a candidate for a follow-up optimisation (start the commit during stream tail rather than after lock release) but is not a regression vs. today.Behavioural contract (preserved)
The original "never extend stream latency for a postcommit" contract is preserved:
pendingreturn ({"stored": False, "mode": "async_pending", "reason": <unsafe_reason>}) is unchanged.state.postcommit_executor(orstate.generation_executoras fallback), exactly as before._IDLE_POSTCOMMIT_MAX_WAIT_S = 30s, the background commit logsabandoned_foreground_busyand returns. The model lock is never blocked indefinitely.async_errorso the executor never propagates faults.Risk
_store_retokenized_history_snapshot) that already had a synchronous caller (theinlinepostcommit path). The only behavioural change is that the async path now invokes it instead of being a stub. Bankputsemantics, eviction, and per-session caps are unchanged.Tests
tests/test_openai_bridge.py::test_idle_async_postcommit_returns_pending_and_dispatches_retokenized_committo assert the new contract (commit is attempted, not silently abandoned).test_idle_async_postcommit_attempts_commit_for_tool_call_responses- regression test for the tool-call case specifically. Verifiesassistant_tool_callspropagates into_store_retokenized_history_snapshot.test_idle_async_postcommit_abandons_when_foreground_stays_busy- covers the deadline-abandon path so the bounded-wait guarantee is enforced.tests/test_session_bank.pyandtests/test_server_openai.pytests still pass.Files changed
mtplx/server/openai.py— fill in_schedule_idle_postcommit_snapshotbody (~67 lines added, 16 removed).tests/test_openai_bridge.py— replace one stale test, add two regression tests.CHANGELOG.md— add an "Unreleased" entry describing the fix.