🔒 fix(soft): harden stale-lock breaking and self-heal malformed locks#551
Merged
Conversation
Break stale lock files by verified inode rather than blind path: rename, re-lstat, and unlink only when the mtime is unchanged from the stale decision. A newer mtime means a peer recreated the lock between read and rename, so the file is left in place instead of unlinking a live holder's file. Factor this into a shared break_lock_file helper used by both the soft lock and the lifetime-expiry path, removing the duplicated rename dance. Treat a non-integer PID or creation time as malformed so two- and three-line garbage lock files self-heal when old instead of staying stuck forever; "well-formed" is no longer line-count only. Give AsyncReadWriteLock a __del__ that shuts down the owned single-thread executor if close() was never called; caller-supplied executors are left untouched. Fix the executor property type and docstring accordingly. Derive opposite/direction inside _validate_reentrant to drop the duplicated computation at both call sites, and reassign timeout symmetrically with blocking in the soft read/write acquire path.
ce51506 to
2c42ba3
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
A process sharing the lock directory on the same host could swap a held soft lock file for a symlink pointing at an old file, so the lifetime check saw the target's stale mtime, a waiter broke the still-live lock, and two processes held it at once. 🔒 This builds on @dxbjavid's #550, which switched the lifetime check to
os.lstatso it reads the symlink's own mtime, and closes the remaining gaps in the same stale-breaking path.Breaking a stale lock now claims the file by inode before removing it. A shared
break_lock_filehelper renames the lock to a process-private name, re-checks the modification time, and unlinks only when it still matches the value seen at the stale decision. A newer time means a peer recreated the lock in the gap, so the helper leaves the file in place rather than deleting it out from under a live holder. Both the soft lock and the lifetime path use this helper, replacing the duplicated rename logic. Malformed lock files self-heal more reliably too: a non-integer PID or creation time now counts as unparseable, so a two or three line garbage file gets evicted once it ages past the safety window instead of wedging acquirers forever.AsyncReadWriteLockgained a destructor that shuts down the single-thread executor it owns, so forgetting to callclose()no longer leaks the worker thread, while a caller-supplied executor stays untouched. Reviewers should know that releasing a soft lock may now remove the lock file as part of self-healing, and that filelock evicts malformed files automatically after a brief safety window. Supersedes #550. 🙏