✨ feat(rw): add SoftReadWriteLock for NFS and HPC clusters#528
Draft
gaborbernat wants to merge 1 commit intotox-dev:mainfrom
Draft
✨ feat(rw): add SoftReadWriteLock for NFS and HPC clusters#528gaborbernat wants to merge 1 commit intotox-dev:mainfrom
gaborbernat wants to merge 1 commit intotox-dev:mainfrom
Conversation
acb0094 to
0afd22f
Compare
Slurm and other HPC deployments run across multiple compute nodes sharing an NFS mount, and need reader/writer locking semantics there. The existing ReadWriteLock is SQLite-backed and cannot run on NFS because SQLite's own docs warn against POSIX fcntl locking on network filesystems. SoftFileLock works on NFS but is exclusive-only and its pid+hostname stale check only evicts crashes on the same host, so a dead writer on one node would wedge readers on every other node until manual cleanup. SoftReadWriteLock fills both gaps. It stores state as sidecar files next to the lock path: a short-held .state SoftFileLock mutex, a .write marker for the exclusive writer, and a .readers/ directory with one file per active reader. Each marker carries a random 128-bit token plus the holder's pid and hostname; a daemon heartbeat thread refreshes mtime every 30 seconds, and any process on any host may evict a marker whose mtime has not advanced in 90 seconds. This is the etcd LeaseKeepAlive TTL/3 ratio adapted to filesystem primitives. Eviction uses an atomic read-then-rename-then-verify pattern borrowed from SoftFileLock so a heartbeat firing mid-break cannot double-own the lock. Writer acquisition is two-phase and writer-preferring: phase one claims .write, which immediately blocks new readers, phase two drains existing readers. A 99/1 reader-heavy workload cannot starve a writer. Every marker open uses O_CREAT | O_EXCL | O_NOFOLLOW with mode 0o600 to block the symlink-overwrite attack where an attacker pre-creates the marker as a symlink to a victim file. The readers directory gets 0o700 plus an lstat check and a dirfd-relative open, because mkdir has no O_NOFOLLOW equivalent. The marker parser is bounded (1 KiB cap, regex- validated token, bounded pid, ASCII hostname) so attacker-controlled content cannot crash or mis-authenticate. Fork safety is the last hazard. Python threads do not survive fork(), so the child would inherit the marker files and lock-level state but no heartbeat thread; the parent would keep refreshing while the child would not, and both would believe they hold the lock. An os.register_ at_fork(after_in_child=...) hook replaces the inherited threading.Lock objects with fresh ones and marks the instance fork-invalidated, making release() a no-op so inherited `with` blocks unwind cleanly in the child. PyMongo's connection-pool handling is the reference. Tests cover behavior, reentrancy, stale detection, symlink refusal, permissions, transaction-lock thread contention, and fork semantics entirely through the public API. Coverage on the new modules is 100% on async and 99% on sync, with pragmas on race-only branches. The full suite runs stably across multiple back-to-back iterations. Closes tox-dev#526
0afd22f to
5ed7f6d
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Slurm and other HPC deployments run across multiple compute nodes sharing an NFS mount and need reader/writer locking semantics there. The existing
ReadWriteLockis SQLite-backed and cannot run on NFS becausesqlite.orgitself warns against POSIXfcntllocking on network filesystems.SoftFileLockworks on NFS but is exclusive-only, and its pid+hostname stale check only evicts crashes on the same host, so a dead writer onnode-42would wedge readers on every other node until manual cleanup. Closes #526.SoftReadWriteLockfills both gaps. It stores state as sidecar files next to the lock path: a short-held.stateSoftFileLockmutex, a.writemarker for the exclusive writer, and a.readers/directory with one file per active reader. Each marker carries a random 128-bit token plus the holder's pid and hostname; a daemon heartbeat thread refreshesmtimeeveryheartbeat_intervalseconds (default 30), and any process on any host may evict a marker whosemtimehas not advanced instale_thresholdseconds (default 90, the etcdLeaseKeepAliveratio of TTL/3). 🩺 Eviction uses an atomic read → rename-to-unique-break-name → re-verify → unlink pattern borrowed fromSoftFileLock._try_break_stale_lock, so a heartbeat firing mid-break cannot double-own the lock. Writer acquisition is two-phase and writer-preferring: phase one atomically claims.write(which immediately blocks new readers), phase two waits for existing readers to drain. A 99/1 reader-heavy workload cannot starve a writer.Every marker open uses
O_CREAT | O_EXCL | O_NOFOLLOWwith mode0o600to block the symlink-overwrite attack where an attacker pre-creates the marker as a symlink to a victim file. 🔒 The readers directory gets0o700plus anlstatcheck and a dirfd-relative open, becausemkdirhas noO_NOFOLLOWequivalent; every subsequent reader-file operation flows through the dirfd. The marker parser is bounded (1 KiB cap, regex-validated token, bounded pid, ASCII hostname) so attacker-controlled content cannot crash the library or mis-authenticate as a live holder. The class runs the same three trust-boundary classesSoftFileLockalready handles (same-UID non-cooperating processes, cross-host same-UID, same-host different-UID via permissions) and does not claim protection against root compromise, NTP tampering, or multi-tenant mounts where hostile co-tenants share the UID.Fork safety was the last hazard worth calling out. Python threads do not survive
fork(), so a child would inherit the marker files and the lock-level state without the heartbeat thread; the parent would keep refreshing while the child would not, and both processes would believe they hold the lock. 🍴os.register_at_fork(after_in_child=...)replaces the inheritedthreading.Lockobjects with fresh ones and marks the instance fork-invalidated, sorelease()on an inheritedwithblock is a silent no-op and the child must callSoftReadWriteLock(path)again before acquiring. This matches PyMongo's connection-pool semantics and is documented in the concepts page alongside the heartbeat/TTL explanation.For reviewers: the docs touch all four Diátaxis dimensions (tutorials, how-to, concepts, autodoc-driven API reference) plus a new card in the lock-types grid on
docs/index.rst. The changelog file is intentionally not edited — the release workflow regenerates it from PR titles at tag time.Context and design discussion lives in the issue thread: #526