fix(session): prevent concurrent commit re-committing old messages#783
fix(session): prevent concurrent commit re-committing old messages#783deepakdevp wants to merge 3 commits intovolcengine:mainfrom
Conversation
commit_async() now acquires an asyncio.Lock during Phase 1 (copy + clear + file write). This prevents concurrent commits from re-committing the same messages. The lock is released before the slow LLM summary and memory extraction, so it doesn't block other operations. The phase order is changed: live messages are cleared BEFORE the archive summary is generated, closing the race window where a second commit could see stale data. If the file-clear fails, messages are rolled back. Fixes volcengine#580. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Tests verify that: - Two concurrent commit_async() calls on the same session produce exactly one archive (the other returns early) - Messages added while a commit is running are preserved in the session and not lost or re-committed Part of fix for volcengine#580. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Thanks for the contribution! The race condition analysis is spot-on — there is indeed a window during Existing protectionsThe session commit path already has several layers of protection:
What's missing is exactly what you identified: Phase 1 atomicity — the gap between copying messages and clearing them, with a slow LLM summary call in between. Why
|
Replaces the in-process asyncio.Lock with the existing PathLock (distributed filesystem lock via LockContext) for Phase 1 of commit_async(). This ensures commit serialization works across multiple HTTP workers and service instances, not just within a single Python process. Addresses review feedback from qin-ctx on PR volcengine#783.
|
Thanks @qin-ctx for the thorough review and the suggestion! You're absolutely right that I've replaced it with Changes in the latest push:
Please take another look when you get a chance! |
Summary
asyncio.LocktoSessionto serialize concurrentcommit_async()callscommit_async()to clear live messages before the slow LLM summary generation, closing the race window where a second commit could see stale dataFixes #580
Root Cause
commit_async()had no synchronization. When called concurrently, both calls would copy the sameself._messages, generate separate archives, and trigger duplicate memory extraction. The race window spanned the entire LLM summary generation (seconds), during which the livemessages.jsonlstill contained the old messages.Changes Made
openviking/session/session.py:self._commit_lock = asyncio.Lock()toSession.__init__async with self._commit_locktests/session/test_session_commit_race.py(new): 2 testsType of Change
Testing
🤖 Generated with Claude Code