Skip to content

Fix race condition causing permanent loss of active rule files#10238

Merged
szgupta merged 3 commits into
masterfrom
suraj/fix-rule-files-race-condition
May 6, 2026
Merged

Fix race condition causing permanent loss of active rule files#10238
szgupta merged 3 commits into
masterfrom
suraj/fix-rule-files-race-condition

Conversation

@szgupta
Copy link
Copy Markdown
Member

@szgupta szgupta commented May 6, 2026

Description

Fix a race condition in `ProjectContextModel's file watcher stream handler that can permanently lose active rule files until client restart.

Root cause

The stream handler processes each RepositoryUpdate by spawning an async task. The original code called path_to_rules.remove() before spawning, then re-inserted the updated rules in the task's completion callback. This created two distinct race windows:

Race 1 — concurrent task overwrites (original bug)

Consider two updates, U1 and U2, arriving in quick succession for the same path:

  1. U1 arrives → remove() extracts the current rules → async task T1 spawned with a snapshot of those rules.
  2. U2 arrives while T1 is in flight → remove() returns None (the key is gone) → the if let Some(rules) = existing_rules guard fires → U2 is silently dropped.
  3. T1 completes and re-inserts its result. If U1 was a deletion and U2 was the corresponding re-addition (common during an editor's atomic save: delete-then-write), the file is now permanently absent from path_to_rules until restart.

Race 2 — stale-snapshot overwrites (Oz review finding)

Even after fixing Race 1 by keeping rules in the map during async processing, a second race remained: if T1 and T2 both read the rules before either writes back, T2's write silently discards T1's changes.

Fix

For Race 1: Replace remove() with get().cloned() so the rules stay in path_to_rules throughout async processing. A pending_updates queue (keyed by path) is introduced: if a task is already in flight for a path, incoming updates are queued instead of spawned concurrently. When a task completes, it drains the queue — processing all accumulated updates sequentially against the freshest rules — before clearing the in-flight marker.

For Race 2: The drain loop combines multiple sequential deltas using a new RulesDelta::merge() method that is order-preserving: for each path, the last operation wins. This correctly handles (add → delete) → net deletion and (delete → add) → net addition. The previous deduplicate() was symmetric — it cancelled both sides regardless of order — which would silently drop real state changes observed by persistence consumers.

Changes

  • ProjectContextModel: adds pending_updates: HashMap<PathBuf, Vec<RepositoryUpdate>> to serialize per-path processing.
  • RulesDelta::merge(): replaces deduplicate() with an order-preserving combinator.
  • apply_update_result(): extracted helper shared by the stream handler and the drain loop.
  • Unit tests for all merge() orderings (add→delete, delete→add, add→delete→add, etc.).

Linked Issue

N/A - discovered via debugging

Testing

  • Manual verification of the race condition scenario.
  • Unit tests covering all RulesDelta::merge() orderings in model_tests.rs.

Agent Mode

  • Warp Agent Mode - This PR was created via Warp's AI Agent Mode

Conversation link

Co-Authored-By: Oz oz-agent@warp.dev

@cla-bot cla-bot Bot added the cla-signed label May 6, 2026
@oz-for-oss
Copy link
Copy Markdown
Contributor

oz-for-oss Bot commented May 6, 2026

@szgupta

I'm starting a first review of this pull request.

You can view the conversation on Warp.

I completed the review and no human review was requested for this pull request.

Comment /oz-review on this pull request to retrigger a review (up to 3 times on the same pull request).

Powered by Oz

@szgupta szgupta requested review from cephalonaut and jefflloyd May 6, 2026 05:44
Copy link
Copy Markdown
Contributor

@oz-for-oss oz-for-oss Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overview

This PR attempts to fix a rule-file update race by keeping path_to_rules populated while repository update processing runs asynchronously.

Concerns

  • The new clone-based flow still allows overlapping update tasks to process stale independent snapshots and overwrite each other when they reinsert results, so rule additions, deletions, or content updates can still be lost.

Verdict

Found: 0 critical, 1 important, 0 suggestions

Request changes

Comment /oz-review on this pull request to retrigger a review (up to 3 times on the same pull request).

Powered by Oz

Comment thread crates/ai/src/project_context/model.rs Outdated
}

let existing_rules = me.path_to_rules.remove(&path_clone);
let existing_rules = me.path_to_rules.get(&path_clone).cloned();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [IMPORTANT] Cloning the current rules lets overlapping update tasks process independent stale snapshots, and whichever task finishes last overwrites the others when it reinserts the map entry. This still loses rule additions, deletions, or content updates; serialize update processing for this path or merge each update into the latest state before reinserting.

When a repository update arrives, the stream handler removes rules from
the path_to_rules HashMap and re-inserts them asynchronously after
processing. If a second update arrives during this async gap (e.g. from
a git checkout or atomic file save), the remove returns None and the
update is silently dropped. If the first update was a deletion and the
dropped second update was the corresponding addition, the rules are
permanently lost until client restart.

Fix: clone the rules instead of removing them so they remain available
in the HashMap during async processing, and serialize update processing
per path by tracking in-flight tasks and queuing updates that arrive
during the async gap. When a task completes, queued updates are drained
and processed sequentially against the latest state, preventing both the
permanent-loss bug and stale-snapshot overwrites.

Co-Authored-By: Oz <oz-agent@warp.dev>
@szgupta szgupta force-pushed the suraj/fix-rule-files-race-condition branch from 9dc7fb2 to 3520824 Compare May 6, 2026 07:01
Comment thread crates/ai/src/project_context/model.rs Outdated
Self::process_repository_updates(update, current_rules, repo_path.clone())
.await;
current_rules = updated_rules;
combined_delta
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall, nice find! I wonder if there's a little potential for trouble in the fact that we break updates into discovered and deleted rules? This loses the ordering of those updates so an (add, delete) of the same rule ends up looking just like a (delete, add) in the emitted event. Might be better before adding to either list to dedupe?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@szgupta szgupta force-pushed the suraj/fix-rule-files-race-condition branch from 81c961a to b9223a5 Compare May 6, 2026 19:52
@szgupta szgupta enabled auto-merge (squash) May 6, 2026 20:09
@szgupta szgupta merged commit 5146a5b into master May 6, 2026
38 of 40 checks passed
@szgupta szgupta deleted the suraj/fix-rule-files-race-condition branch May 6, 2026 20:40
trungtai1805 pushed a commit to trungtai1805/warp that referenced this pull request May 9, 2026
…otdev#10238)

## Description

Fix a race condition in `ProjectContextModel's file watcher stream
handler that can permanently lose active rule files until client
restart.

### Root cause

The stream handler processes each `RepositoryUpdate` by spawning an
async task. The original code called `path_to_rules.remove()` before
spawning, then re-inserted the updated rules in the task's completion
callback. This created two distinct race windows:

**Race 1 — concurrent task overwrites (original bug)**

Consider two updates, U1 and U2, arriving in quick succession for the
same path:

1. U1 arrives → `remove()` extracts the current rules → async task T1
spawned with a snapshot of those rules.
2. U2 arrives while T1 is in flight → `remove()` returns `None` (the key
is gone) → the `if let Some(rules) = existing_rules` guard fires → **U2
is silently dropped**.
3. T1 completes and re-inserts its result. If U1 was a deletion and U2
was the corresponding re-addition (common during an editor's atomic
save: delete-then-write), the file is now permanently absent from
`path_to_rules` until restart.

**Race 2 — stale-snapshot overwrites (Oz review finding)**

Even after fixing Race 1 by keeping rules in the map during async
processing, a second race remained: if T1 and T2 both read the rules
_before_ either writes back, T2's write silently discards T1's changes.

### Fix

**For Race 1:** Replace `remove()` with `get().cloned()` so the rules
stay in `path_to_rules` throughout async processing. A `pending_updates`
queue (keyed by path) is introduced: if a task is already in flight for
a path, incoming updates are queued instead of spawned concurrently.
When a task completes, it drains the queue — processing all accumulated
updates sequentially against the freshest rules — before clearing the
in-flight marker.

**For Race 2:** The drain loop combines multiple sequential deltas using
a new `RulesDelta::merge()` method that is order-preserving: for each
path, the _last_ operation wins. This correctly handles (add → delete) →
net deletion and (delete → add) → net addition. The previous
`deduplicate()` was symmetric — it cancelled both sides regardless of
order — which would silently drop real state changes observed by
persistence consumers.

### Changes

- `ProjectContextModel`: adds `pending_updates: HashMap<PathBuf,
Vec<RepositoryUpdate>>` to serialize per-path processing.
- `RulesDelta::merge()`: replaces `deduplicate()` with an
order-preserving combinator.
- `apply_update_result()`: extracted helper shared by the stream handler
and the drain loop.
- Unit tests for all `merge()` orderings (add→delete, delete→add,
add→delete→add, etc.).

## Linked Issue

N/A - discovered via debugging

## Testing

- Manual verification of the race condition scenario.
- Unit tests covering all `RulesDelta::merge()` orderings in
`model_tests.rs`.

## Agent Mode
- [x] Warp Agent Mode - This PR was created via Warp's AI Agent Mode

[Conversation
link](https://staging.warp.dev/conversation/6cf1e2ca-9c21-4a00-9751-5fc9d7136473)

Co-Authored-By: Oz <oz-agent@warp.dev>

<!--
CHANGELOG-BUG-FIX: Fixed an issue where rule files (WARP.md / AGENTS.md)
could permanently stop being included in AI conversations until client
restart.
-->

---------

Co-authored-by: Oz <oz-agent@warp.dev>
zhangyu1818-bot pushed a commit to zhangyu1818/warply that referenced this pull request May 10, 2026
…otdev#10238)

## Description

Fix a race condition in `ProjectContextModel's file watcher stream
handler that can permanently lose active rule files until client
restart.

### Root cause

The stream handler processes each `RepositoryUpdate` by spawning an
async task. The original code called `path_to_rules.remove()` before
spawning, then re-inserted the updated rules in the task's completion
callback. This created two distinct race windows:

**Race 1 — concurrent task overwrites (original bug)**

Consider two updates, U1 and U2, arriving in quick succession for the
same path:

1. U1 arrives → `remove()` extracts the current rules → async task T1
spawned with a snapshot of those rules.
2. U2 arrives while T1 is in flight → `remove()` returns `None` (the key
is gone) → the `if let Some(rules) = existing_rules` guard fires → **U2
is silently dropped**.
3. T1 completes and re-inserts its result. If U1 was a deletion and U2
was the corresponding re-addition (common during an editor's atomic
save: delete-then-write), the file is now permanently absent from
`path_to_rules` until restart.

**Race 2 — stale-snapshot overwrites (Oz review finding)**

Even after fixing Race 1 by keeping rules in the map during async
processing, a second race remained: if T1 and T2 both read the rules
_before_ either writes back, T2's write silently discards T1's changes.

### Fix

**For Race 1:** Replace `remove()` with `get().cloned()` so the rules
stay in `path_to_rules` throughout async processing. A `pending_updates`
queue (keyed by path) is introduced: if a task is already in flight for
a path, incoming updates are queued instead of spawned concurrently.
When a task completes, it drains the queue — processing all accumulated
updates sequentially against the freshest rules — before clearing the
in-flight marker.

**For Race 2:** The drain loop combines multiple sequential deltas using
a new `RulesDelta::merge()` method that is order-preserving: for each
path, the _last_ operation wins. This correctly handles (add → delete) →
net deletion and (delete → add) → net addition. The previous
`deduplicate()` was symmetric — it cancelled both sides regardless of
order — which would silently drop real state changes observed by
persistence consumers.

### Changes

- `ProjectContextModel`: adds `pending_updates: HashMap<PathBuf,
Vec<RepositoryUpdate>>` to serialize per-path processing.
- `RulesDelta::merge()`: replaces `deduplicate()` with an
order-preserving combinator.
- `apply_update_result()`: extracted helper shared by the stream handler
and the drain loop.
- Unit tests for all `merge()` orderings (add→delete, delete→add,
add→delete→add, etc.).

## Linked Issue

N/A - discovered via debugging

## Testing

- Manual verification of the race condition scenario.
- Unit tests covering all `RulesDelta::merge()` orderings in
`model_tests.rs`.

## Agent Mode
- [x] Warp Agent Mode - This PR was created via Warp's AI Agent Mode

[Conversation
link](https://staging.warp.dev/conversation/6cf1e2ca-9c21-4a00-9751-5fc9d7136473)

Co-Authored-By: Oz <oz-agent@warp.dev>

<!--
CHANGELOG-BUG-FIX: Fixed an issue where rule files (WARP.md / AGENTS.md)
could permanently stop being included in AI conversations until client
restart.
-->

---------

Co-authored-by: Oz <oz-agent@warp.dev>
(cherry picked from commit 5146a5b)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants