Fix race condition causing permanent loss of active rule files#10238
Conversation
|
I'm starting a first review of this pull request. You can view the conversation on Warp. I completed the review and no human review was requested for this pull request. Comment Powered by Oz |
There was a problem hiding this comment.
Overview
This PR attempts to fix a rule-file update race by keeping path_to_rules populated while repository update processing runs asynchronously.
Concerns
- The new clone-based flow still allows overlapping update tasks to process stale independent snapshots and overwrite each other when they reinsert results, so rule additions, deletions, or content updates can still be lost.
Verdict
Found: 0 critical, 1 important, 0 suggestions
Request changes
Comment /oz-review on this pull request to retrigger a review (up to 3 times on the same pull request).
Powered by Oz
| } | ||
|
|
||
| let existing_rules = me.path_to_rules.remove(&path_clone); | ||
| let existing_rules = me.path_to_rules.get(&path_clone).cloned(); |
There was a problem hiding this comment.
When a repository update arrives, the stream handler removes rules from the path_to_rules HashMap and re-inserts them asynchronously after processing. If a second update arrives during this async gap (e.g. from a git checkout or atomic file save), the remove returns None and the update is silently dropped. If the first update was a deletion and the dropped second update was the corresponding addition, the rules are permanently lost until client restart. Fix: clone the rules instead of removing them so they remain available in the HashMap during async processing, and serialize update processing per path by tracking in-flight tasks and queuing updates that arrive during the async gap. When a task completes, queued updates are drained and processed sequentially against the latest state, preventing both the permanent-loss bug and stale-snapshot overwrites. Co-Authored-By: Oz <oz-agent@warp.dev>
9dc7fb2 to
3520824
Compare
| Self::process_repository_updates(update, current_rules, repo_path.clone()) | ||
| .await; | ||
| current_rules = updated_rules; | ||
| combined_delta |
There was a problem hiding this comment.
Looks good overall, nice find! I wonder if there's a little potential for trouble in the fact that we break updates into discovered and deleted rules? This loses the ordering of those updates so an (add, delete) of the same rule ends up looking just like a (delete, add) in the emitted event. Might be better before adding to either list to dedupe?
81c961a to
b9223a5
Compare
…otdev#10238) ## Description Fix a race condition in `ProjectContextModel's file watcher stream handler that can permanently lose active rule files until client restart. ### Root cause The stream handler processes each `RepositoryUpdate` by spawning an async task. The original code called `path_to_rules.remove()` before spawning, then re-inserted the updated rules in the task's completion callback. This created two distinct race windows: **Race 1 — concurrent task overwrites (original bug)** Consider two updates, U1 and U2, arriving in quick succession for the same path: 1. U1 arrives → `remove()` extracts the current rules → async task T1 spawned with a snapshot of those rules. 2. U2 arrives while T1 is in flight → `remove()` returns `None` (the key is gone) → the `if let Some(rules) = existing_rules` guard fires → **U2 is silently dropped**. 3. T1 completes and re-inserts its result. If U1 was a deletion and U2 was the corresponding re-addition (common during an editor's atomic save: delete-then-write), the file is now permanently absent from `path_to_rules` until restart. **Race 2 — stale-snapshot overwrites (Oz review finding)** Even after fixing Race 1 by keeping rules in the map during async processing, a second race remained: if T1 and T2 both read the rules _before_ either writes back, T2's write silently discards T1's changes. ### Fix **For Race 1:** Replace `remove()` with `get().cloned()` so the rules stay in `path_to_rules` throughout async processing. A `pending_updates` queue (keyed by path) is introduced: if a task is already in flight for a path, incoming updates are queued instead of spawned concurrently. When a task completes, it drains the queue — processing all accumulated updates sequentially against the freshest rules — before clearing the in-flight marker. **For Race 2:** The drain loop combines multiple sequential deltas using a new `RulesDelta::merge()` method that is order-preserving: for each path, the _last_ operation wins. This correctly handles (add → delete) → net deletion and (delete → add) → net addition. The previous `deduplicate()` was symmetric — it cancelled both sides regardless of order — which would silently drop real state changes observed by persistence consumers. ### Changes - `ProjectContextModel`: adds `pending_updates: HashMap<PathBuf, Vec<RepositoryUpdate>>` to serialize per-path processing. - `RulesDelta::merge()`: replaces `deduplicate()` with an order-preserving combinator. - `apply_update_result()`: extracted helper shared by the stream handler and the drain loop. - Unit tests for all `merge()` orderings (add→delete, delete→add, add→delete→add, etc.). ## Linked Issue N/A - discovered via debugging ## Testing - Manual verification of the race condition scenario. - Unit tests covering all `RulesDelta::merge()` orderings in `model_tests.rs`. ## Agent Mode - [x] Warp Agent Mode - This PR was created via Warp's AI Agent Mode [Conversation link](https://staging.warp.dev/conversation/6cf1e2ca-9c21-4a00-9751-5fc9d7136473) Co-Authored-By: Oz <oz-agent@warp.dev> <!-- CHANGELOG-BUG-FIX: Fixed an issue where rule files (WARP.md / AGENTS.md) could permanently stop being included in AI conversations until client restart. --> --------- Co-authored-by: Oz <oz-agent@warp.dev>
…otdev#10238) ## Description Fix a race condition in `ProjectContextModel's file watcher stream handler that can permanently lose active rule files until client restart. ### Root cause The stream handler processes each `RepositoryUpdate` by spawning an async task. The original code called `path_to_rules.remove()` before spawning, then re-inserted the updated rules in the task's completion callback. This created two distinct race windows: **Race 1 — concurrent task overwrites (original bug)** Consider two updates, U1 and U2, arriving in quick succession for the same path: 1. U1 arrives → `remove()` extracts the current rules → async task T1 spawned with a snapshot of those rules. 2. U2 arrives while T1 is in flight → `remove()` returns `None` (the key is gone) → the `if let Some(rules) = existing_rules` guard fires → **U2 is silently dropped**. 3. T1 completes and re-inserts its result. If U1 was a deletion and U2 was the corresponding re-addition (common during an editor's atomic save: delete-then-write), the file is now permanently absent from `path_to_rules` until restart. **Race 2 — stale-snapshot overwrites (Oz review finding)** Even after fixing Race 1 by keeping rules in the map during async processing, a second race remained: if T1 and T2 both read the rules _before_ either writes back, T2's write silently discards T1's changes. ### Fix **For Race 1:** Replace `remove()` with `get().cloned()` so the rules stay in `path_to_rules` throughout async processing. A `pending_updates` queue (keyed by path) is introduced: if a task is already in flight for a path, incoming updates are queued instead of spawned concurrently. When a task completes, it drains the queue — processing all accumulated updates sequentially against the freshest rules — before clearing the in-flight marker. **For Race 2:** The drain loop combines multiple sequential deltas using a new `RulesDelta::merge()` method that is order-preserving: for each path, the _last_ operation wins. This correctly handles (add → delete) → net deletion and (delete → add) → net addition. The previous `deduplicate()` was symmetric — it cancelled both sides regardless of order — which would silently drop real state changes observed by persistence consumers. ### Changes - `ProjectContextModel`: adds `pending_updates: HashMap<PathBuf, Vec<RepositoryUpdate>>` to serialize per-path processing. - `RulesDelta::merge()`: replaces `deduplicate()` with an order-preserving combinator. - `apply_update_result()`: extracted helper shared by the stream handler and the drain loop. - Unit tests for all `merge()` orderings (add→delete, delete→add, add→delete→add, etc.). ## Linked Issue N/A - discovered via debugging ## Testing - Manual verification of the race condition scenario. - Unit tests covering all `RulesDelta::merge()` orderings in `model_tests.rs`. ## Agent Mode - [x] Warp Agent Mode - This PR was created via Warp's AI Agent Mode [Conversation link](https://staging.warp.dev/conversation/6cf1e2ca-9c21-4a00-9751-5fc9d7136473) Co-Authored-By: Oz <oz-agent@warp.dev> <!-- CHANGELOG-BUG-FIX: Fixed an issue where rule files (WARP.md / AGENTS.md) could permanently stop being included in AI conversations until client restart. --> --------- Co-authored-by: Oz <oz-agent@warp.dev> (cherry picked from commit 5146a5b)
Description
Fix a race condition in `ProjectContextModel's file watcher stream handler that can permanently lose active rule files until client restart.
Root cause
The stream handler processes each
RepositoryUpdateby spawning an async task. The original code calledpath_to_rules.remove()before spawning, then re-inserted the updated rules in the task's completion callback. This created two distinct race windows:Race 1 — concurrent task overwrites (original bug)
Consider two updates, U1 and U2, arriving in quick succession for the same path:
remove()extracts the current rules → async task T1 spawned with a snapshot of those rules.remove()returnsNone(the key is gone) → theif let Some(rules) = existing_rulesguard fires → U2 is silently dropped.path_to_rulesuntil restart.Race 2 — stale-snapshot overwrites (Oz review finding)
Even after fixing Race 1 by keeping rules in the map during async processing, a second race remained: if T1 and T2 both read the rules before either writes back, T2's write silently discards T1's changes.
Fix
For Race 1: Replace
remove()withget().cloned()so the rules stay inpath_to_rulesthroughout async processing. Apending_updatesqueue (keyed by path) is introduced: if a task is already in flight for a path, incoming updates are queued instead of spawned concurrently. When a task completes, it drains the queue — processing all accumulated updates sequentially against the freshest rules — before clearing the in-flight marker.For Race 2: The drain loop combines multiple sequential deltas using a new
RulesDelta::merge()method that is order-preserving: for each path, the last operation wins. This correctly handles (add → delete) → net deletion and (delete → add) → net addition. The previousdeduplicate()was symmetric — it cancelled both sides regardless of order — which would silently drop real state changes observed by persistence consumers.Changes
ProjectContextModel: addspending_updates: HashMap<PathBuf, Vec<RepositoryUpdate>>to serialize per-path processing.RulesDelta::merge(): replacesdeduplicate()with an order-preserving combinator.apply_update_result(): extracted helper shared by the stream handler and the drain loop.merge()orderings (add→delete, delete→add, add→delete→add, etc.).Linked Issue
N/A - discovered via debugging
Testing
RulesDelta::merge()orderings inmodel_tests.rs.Agent Mode
Conversation link
Co-Authored-By: Oz oz-agent@warp.dev