-
Notifications
You must be signed in to change notification settings - Fork 8
fix: Resolve deadlock in directory operations using generation-based invalidation #39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…invalidation This fixes issue #37 where tigrisfs would deadlock during concurrent directory operations, particularly when listing directories. The deadlock occurred because: 1. listObjectsFlat() held dh.mu while calling sealDir() 2. sealDir() -> removeExpired() -> removeChildUnlocked() tried to lock ALL directory handles' mutexes 3. Other threads could hold different directory handle mutexes while waiting for dh.mu, creating a circular lock dependency The solution uses a generation counter approach: - Each directory maintains an atomic generation counter - The counter increments on structural changes (add/remove children) - Directory handles check the generation before operations - If generation changed, handles reset their position without locking
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR fixes a deadlock issue in concurrent directory operations by implementing a generation-based invalidation system. The deadlock occurred when listObjectsFlat() held a directory handle mutex while calling sealDir(), which attempted to lock all directory handle mutexes, creating circular dependencies with other threads.
- Replace direct mutex locking of directory handles with atomic generation counters
- Add generation tracking to both
DirInodeDataandDirHandlestructures - Implement lock-free invalidation mechanism for directory handle position tracking
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Summary
This PR implements a generation-based invalidation mechanism to resolve deadlocks in concurrent directory operations, specifically fixing issue #37 where ls operations would hang indefinitely.
Key Changes:
- Added generation counters to
DirInodeDataandDirHandlestructures to track directory structure changes - Replaced direct handle locking with atomic generation counter increments during structural changes
- Modified
listObjectsFlat()to release locks before callingsealDir(), preventing circular lock dependencies - Directory handles now check generation counters to detect stale state and reset positions without cross-handle locking
Problem Solved:
The original deadlock occurred when listObjectsFlat() held dh.mu while calling sealDir(), which tried to lock all directory handle mutexes. If other threads held different directory handle mutexes while waiting for dh.mu, this created a circular dependency. The generation-based approach eliminates this by making handle invalidation lockless.
Confidence Score: 4/5
- This PR is safe to merge with minimal risk
- The fix correctly addresses the root cause of the deadlock using a well-designed generation-based approach. One minor race condition was identified but doesn't affect correctness, only efficiency. The solution is elegant and maintains POSIX compliance for concurrent readdir operations.
- No files require special attention beyond the minor race condition fix suggested
Important Files Changed
File Analysis
| Filename | Score | Overview |
|---|---|---|
| core/dir.go | 4/5 | Implements generation-based invalidation to fix deadlock in directory operations. Clean solution with one minor race condition. |
Now we read the actual generation value after sealDir() returns to ensure we have the correct current generation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Summary
This PR successfully addresses the critical deadlock issue (#37) in directory operations by implementing a generation counter approach instead of explicit mutex coordination between directory handles.
Key Changes:
- Added atomic generation counters to
DirInodeDataandDirHandlestructures to track structural changes - Replaced the problematic lock coordination pattern in
listObjectsFlat()with generation-based invalidation - Modified all structural change operations (
sealDir,insertChildUnlocked,removeChildUnlocked,removeAllChildrenUnlocked) to increment the generation counter - Implemented self-validating directory handles that reset their position when generation changes are detected
Issues Found:
- Race conditions exist in generation synchronization logic, particularly in
listObjectsFlat()andcheckDirPosition() - Missing initialization of generation field in
NewDirHandle()constructor - Potential for missed generation updates due to timing issues in unlock/lock patterns
Recommendation:
The approach is sound and addresses the deadlock correctly, but the implementation needs refinement to handle race conditions properly before merging.
Confidence Score: 3/5
- This PR resolves a critical deadlock but introduces race conditions that need fixing
- The generation counter approach correctly addresses the deadlock by avoiding circular locking, but the implementation has race conditions in generation synchronization and missing initialization that could lead to inconsistent state
- Pay close attention to core/dir.go for race conditions in generation handling
Important Files Changed
File Analysis
| Filename | Score | Overview |
|---|---|---|
| core/dir.go | 3/5 | Implements generation-based deadlock resolution but has race conditions in generation synchronization and missing initialization |
…validations The tests were failing because new DirHandles started with generation=0, but the directory might already have a non-zero generation from prior modifications. This caused the handle to immediately think it was out of date and reset its position, leading to missing entries in readdir. Changes: - Initialize generation when creating new DirHandles - Reset generation to current when seeking to offset 0 - Ensure consistent generation tracking across all handle creation paths This fixes the test failures where directory listings were returning incomplete results. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Summary
This PR implements a generation-based invalidation system to resolve deadlocks in concurrent directory operations. The approach replaces cross-handle locking with atomic generation counters that track structural changes, allowing directory handles to self-invalidate without creating lock dependencies.
Key changes:
- Add atomic
generationcounter toDirInodeDataandDirHandlestructs - Increment generation on structural changes (
sealDir,insertChildUnlocked,removeChildUnlocked) - Directory handles check generation in
checkDirPosition()and reset position if changed - Critical deadlock fix in
listObjectsFlat(): releasedh.mubefore callingsealDir()
Issues found:
- Generation not incremented in all
insertChildUnlockedcode paths (empty directory and append cases) - Race condition in
listObjectsFlat()where generation could be outdated after reload - Missing generation increments could cause directory handles to miss structural changes
Confidence Score: 2/5
- This PR has critical logic bugs that need resolution before merge
- While the generation-based approach is sound for resolving the deadlock, there are critical bugs in the generation increment logic that could cause directory handles to miss structural changes. The missing generation increments in insertChildUnlocked and the race condition in listObjectsFlat need to be fixed to ensure correct behavior.
- core/dir.go requires careful attention due to generation increment bugs and race conditions
Important Files Changed
File Analysis
| Filename | Score | Overview |
|---|---|---|
| core/dir.go | 2/5 | Implements generation-based invalidation for directory operations, but has critical bugs in generation increment logic for insertChildUnlocked and race conditions in listObjectsFlat |
| core/cluster_fs.go | 5/5 | Simple addition to initialize directory handle generation when offset is reset to 0 |
| core/cluster_fs_fuse.go | 5/5 | Correctly initializes DirHandle with current generation when opening directory remotely via FUSE |
Move the generation load to immediately after sealDir() returns but before reacquiring dh.mu. This ensures we capture the generation at the point when sealDir completes, avoiding a race where another thread could call sealDir() between our unlock and the generation load. The sequence is now: 1. Unlock dh.mu 2. Call sealDir() 3. Load generation (while dh.mu is still unlocked) 4. Reacquire dh.mu 5. Update dh.generation This ensures we get the most accurate generation value while still avoiding the deadlock. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Summary
This PR successfully addresses the deadlock issue in directory operations by introducing a generation-based invalidation system. The solution replaces the problematic cross-handle mutex locking with atomic generation counters that allow directory handles to self-invalidate when structural changes occur.
Key Changes:
- Added
generationfield toDirInodeDataandDirHandlestructures - Implemented atomic generation increments on structural changes (
sealDir,insertChildUnlocked,removeChildUnlocked,removeAllChildrenUnlocked) - Replaced direct handle invalidation loops with generation-based self-checking in
checkDirPosition() - Modified
listObjectsFlat()to unlockdh.mubefore callingsealDir()to prevent deadlock cycles
Deadlock Resolution:
The original deadlock occurred when listObjectsFlat() held dh.mu while calling sealDir(), which in turn tried to lock all directory handle mutexes through removeExpired(). This created circular lock dependencies when other threads held different handle mutexes while waiting for dh.mu. The new approach eliminates this cross-handle locking entirely.
Race Condition:
There's a minor race condition in listObjectsFlat() where the generation could change between unlocking dh.mu and loading the current generation. While this is acknowledged as "benign" in comments, it could cause a handle to miss an invalidation signal in rare timing scenarios.
Confidence Score: 4/5
- This PR is safe to merge with minimal risk - it fixes a critical deadlock while introducing only minor race conditions
- Score reflects successful deadlock resolution with well-designed generation-based approach, offset by a manageable race condition that needs attention
- Pay attention to the race condition in core/dir.go listObjectsFlat() unlock/lock pattern
Important Files Changed
File Analysis
| Filename | Score | Overview |
|---|---|---|
| core/dir.go | 4/5 | Implements generation-based directory invalidation to fix deadlock; contains race condition in listObjectsFlat unlock/lock pattern |
This fixes potential deadlocks in the loadListing() function where unlock/relock patterns could cause circular lock dependencies. The deadlock could occur in two places: 1. When calling slurpOnce() at line 718-725 2. When calling listObjectsFlat() at line 737-742 Both cases released parent.mu (and dh.mu in case 1) while other goroutines could be holding different locks and waiting, creating circular dependencies. The solution uses the generation counter approach from PR #39: - Check generation before unlocking - After relocking, detect if generation changed - If changed, reset directory handle position This ensures proper synchronization without risking deadlocks. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
* fix: Prevent deadlock in loadListing unlock/relock pattern This fixes potential deadlocks in the loadListing() function where unlock/relock patterns could cause circular lock dependencies. The deadlock could occur in two places: 1. When calling slurpOnce() at line 718-725 2. When calling listObjectsFlat() at line 737-742 Both cases released parent.mu (and dh.mu in case 1) while other goroutines could be holding different locks and waiting, creating circular dependencies. The solution uses the generation counter approach from PR #39: - Check generation before unlocking - After relocking, detect if generation changed - If changed, reset directory handle position This ensures proper synchronization without risking deadlocks. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This fixes issue #37 where tigrisfs would deadlock during concurrent directory operations, particularly when listing directories.
The deadlock occurred because:
The solution uses a generation counter approach:
Note
Introduce generation counters for directories and handles to invalidate positions on structural changes, removing cross-handle locking and preventing deadlocks during listing.
DirInodeData.generationandDirHandle.generation; initialize fromatomic.LoadUint64(&inode.dir.generation)inNewDirHandle, FUSEOpenDir, and on seek/reset paths.generationon structural updates:sealDir(),insertChildUnlocked(),removeChildUnlocked(),removeAllChildrenUnlocked().DirHandle.checkDirPosition()resets position whengenerationchanges;readDir/seek paths reload generation at offset 0.generation.dh.mubeforesealDir()and refresh handlegenerationafter sealing to avoid lock cycles.DirHandles, store initialgenerationalongsideinodeindirHandles.Written by Cursor Bugbot for commit 2d6df29. This will update automatically on new commits. Configure here.