Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
96 changes: 88 additions & 8 deletions docs/concepts.rst
Original file line number Diff line number Diff line change
Expand Up @@ -192,14 +192,23 @@ necessary in testing or if you need platform-specific behavior.
- Your workload is read-heavy with occasional writes.
- Multiple processes need to read simultaneously.
- You want a single writer while no readers are active.
- The lock file lives on a **local** filesystem. ``ReadWriteLock`` is SQLite-backed and is unsafe on NFS.

**Use SoftReadWriteLock** when:

- You want reader/writer semantics on a **network filesystem** (NFS, Lustre with ``-o flock``, HPC shared storage).
- You need cross-host stale detection so a crash on one compute node does not wedge readers on other nodes.
- You are running on a multi-node slurm/HPC cluster.

Lock selection flowchart:

.. mermaid::

flowchart TD
start["Choose a lock type"] --> question1{"Read-heavy workload?"}
question1 -->|Yes| questionAsync{"Async code?"}
question1 -->|Yes| questionRwNet{"Network<br/>filesystem?"}
questionRwNet -->|Yes| srw["Use SoftReadWriteLock"]
questionRwNet -->|No| questionAsync{"Async code?"}
questionAsync -->|Yes| arw["Use AsyncReadWriteLock"]
questionAsync -->|No| rw["Use ReadWriteLock"]
question1 -->|No| question2{"Need network<br/>filesystem support?"}
Expand All @@ -212,7 +221,7 @@ Lock selection flowchart:
classDef alternative fill:#fef3c7,stroke:#f59e0b,stroke-width:2px,color:#78350f
classDef special fill:#dbeafe,stroke:#3b82f6,stroke-width:2px,color:#1e3a5f
class default recommended
class soft,platform alternative
class soft,platform,srw alternative
class rw,arw special

Lock types compared
Expand All @@ -225,50 +234,62 @@ Lock types compared
- FileLock
- SoftFileLock
- ReadWriteLock
- SoftReadWriteLock
- - Exclusive/shared
- Exclusive only
- Exclusive only
- Both (separate context managers)
- Both (separate context managers)
- - Platform enforcement
- OS-level (Windows/Unix)
- No (file-based)
- File-based (SQLite)
- No (file-based, ``O_CREAT|O_EXCL|O_NOFOLLOW``)
- - Network filesystem
- Not reliable
- Works (if you accept the limitations)
- Not reliable
- Works, including cross-host on multi-node clusters
- - Stale lock detection
- N/A (OS-enforced)
- Yes (Unix/macOS only)
- Yes (Unix/macOS only, same-host)
- N/A
- - PID inspection (``pid``, ``is_lock_held_by_us``)
- Yes, TTL-based heartbeat (cross-host)
- - PID inspection
- No
- Yes
- Yes (``pid``, ``is_lock_held_by_us``)
- No
- No (content is not a public API)
- - Lifetime expiration
- Yes
- Yes
- No
- Yes (``heartbeat_interval`` / ``stale_threshold``)
- - Cancel acquisition
- Yes (``cancel_check``)
- Yes (``cancel_check``)
- No
- No
- - Force release
- Yes (``force=True``)
- Yes (``force=True``)
- Yes (``force=True``)
- Yes (``force=True``)
- - Async support
- AsyncFileLock
- AsyncSoftFileLock
- AsyncReadWriteLock
- AsyncSoftReadWriteLock
- - Singleton default
- No
- No
- Yes
- Yes
- - Overhead
- Low
- High
- Medium (SQLite)
- Medium (daemon heartbeat thread + dirfd scans)

**********************
TOCTOU vulnerability
Expand Down Expand Up @@ -321,12 +342,15 @@ On older platforms without ``O_NOFOLLOW``, prefer :class:`UnixFileLock <filelock
OS-level locks (FileLock on Windows/Unix) are unreliable on network filesystems (NFS, SMB). This is a fundamental
limitation of how network filesystems work—they don't reliably support locking semantics.

If you need locking on network filesystems, consider: - Using SoftFileLock (less efficient but more portable) -
Switching to a centralized lock service (Redis, Consul, etc.) - Using a database with transactions
For **exclusive locking** on NFS, use :class:`SoftFileLock <filelock.SoftFileLock>`. For **reader/writer
locking** (shared readers + exclusive writers) on NFS, use :class:`SoftReadWriteLock <filelock.SoftReadWriteLock>`,
which is the only variant in filelock that also handles cross-host stale detection. ``ReadWriteLock`` is SQLite-backed
and is unsafe on NFS because SQLite itself warns against running on network filesystems.

**Locks across different machines**
A lock on one machine doesn't stop another machine from accessing the resource unless they use a centralized locking
service. Filelock is for inter-process coordination on the same machine (or at least the same shared filesystem).
service — or a shared filesystem plus filelock's soft locks. On a multi-node slurm/HPC cluster,
:class:`SoftReadWriteLock <filelock.SoftReadWriteLock>` is correct across compute nodes sharing an NFS mount.

**Read-write semantics**
FileLock is exclusive only—readers block writers and vice versa. If you need multiple readers with occasional
Expand Down Expand Up @@ -385,6 +409,62 @@ implementation. Because Python's :mod:`sqlite3` module has no async API, all blo
thread pool via ``loop.run_in_executor``. This is the same approach used by :class:`BaseAsyncFileLock
<filelock.BaseAsyncFileLock>`.

How does SoftReadWriteLock work on NFS?
=======================================

:class:`SoftReadWriteLock <filelock.SoftReadWriteLock>` fills the gap left by ``ReadWriteLock`` on network filesystems.
It stores state as a small directory tree next to the lock file:

- ``<path>.state`` — a short-held :class:`SoftFileLock <filelock.SoftFileLock>` used as the state mutex during transitions.
- ``<path>.write`` — the writer marker; its presence blocks readers and other writers.
- ``<path>.readers/<host>.<pid>.<uuid>`` — one marker file per active reader.

Each marker stores a random 128-bit token, the holder's pid, and the holder's hostname. Every acquire uses
``O_CREAT | O_EXCL | O_NOFOLLOW`` with mode ``0o600``; the readers directory uses mode ``0o700`` and a ``lstat``
check plus a dirfd-relative open to close symlink races (which ``mkdir`` alone cannot).

Writer acquisition is two-phase and writer-preferring: phase one atomically claims ``<path>.write`` (which blocks
any new reader as soon as it exists), phase two polls the ``readers/`` directory until every reader has exited.
Writer starvation is impossible, which matters under read-heavy workloads such as the 99/1 reader-to-writer mix
typical of slurm job queues.

Cross-host stale detection
==========================

On a multi-node cluster, a process on ``node-42`` that crashes while holding a lock cannot be detected via
``kill(pid, 0)`` from ``node-17`` — the pid means nothing to a different kernel. ``SoftReadWriteLock`` therefore
uses a **TTL with a heartbeat** rather than ``SoftFileLock``'s PID-alive check:

- Each lock instance starts a daemon thread on acquire. The thread refreshes the marker's ``mtime`` every
``heartbeat_interval`` seconds (default 30 s).
- Any process on any host may evict a marker whose ``mtime`` has not advanced in ``stale_threshold`` seconds
(default 90 s, ratio borrowed from etcd's ``LeaseKeepAlive``).
- Eviction is atomic: read → rename to a unique ``.break.<pid>.<nonce>`` file → re-verify token and mtime →
unlink. On verification failure the ``.break.*`` file stays for TTL or atexit cleanup; rollback-rename is itself
racy and is not attempted.
- The heartbeat thread stops itself on token mismatch or a vanished marker, so a replaced or evicted marker
never gets accidentally refreshed.

The trade-off: ``stale_threshold`` must be larger than any realistic pause a holder might hit (GC, syscall delay,
NFS hiccup). Pick it generously. Clock synchronization across compute nodes is assumed — every HPC cluster runs
NTP or chrony, so this is not an additional constraint in the target environment.

Fork semantics
==============

Python threads do not survive ``fork()``. A process that forks while holding a ``SoftReadWriteLock`` would leave
the child with the marker files, the lock-level state, and no heartbeat thread; the parent would keep
refreshing while the child would not, and both would believe they hold the lock. ``SoftReadWriteLock`` registers
an ``os.register_at_fork(after_in_child=...)`` hook that replaces the inherited ``threading.Lock`` objects with
fresh ones and marks the instance fork-invalidated. ``release()`` on an invalidated instance is a no-op, so an
inherited ``with lock.read_lock():`` block can unwind in the child without raising. The child must construct a
fresh ``SoftReadWriteLock(path)`` before it can acquire again. This matches PyMongo's connection-pool semantics.

**Trust boundary.** The class protects against same-UID non-cooperating processes (one host or cross-host) and
same-host different-UID users via the ``0o600`` / ``0o700`` permissions on markers and the readers directory.
It does not protect against root compromise, NTP tampering on same-UID cross-host nodes, or multi-tenant mounts
where hostile co-tenants share the UID.

****************************
File permissions and mode
****************************
Expand Down
66 changes: 66 additions & 0 deletions docs/how-to.rst
Original file line number Diff line number Diff line change
Expand Up @@ -308,6 +308,60 @@ When you're done with a ``ReadWriteLock``, close it to release the underlying SQ
finally:
rw.close() # releases any held lock and closes the SQLite connection

***************************************************
Use read/write locks on network filesystems (NFS)
***************************************************

:class:`ReadWriteLock <filelock.ReadWriteLock>` is SQLite-backed and requires a local filesystem: SQLite's own
docs warn against running on NFS because POSIX ``fcntl`` locks are unreliable there. For HPC clusters, slurm
deployments, or any multi-host shared storage, use :class:`SoftReadWriteLock <filelock.SoftReadWriteLock>`
instead. It is built on :class:`SoftFileLock <filelock.SoftFileLock>` primitives (atomic ``O_CREAT | O_EXCL | O_NOFOLLOW``) and runs a
daemon heartbeat thread that refreshes each held marker's ``mtime`` so any host on any node can evict a stale
marker when the holder crashes.

.. code-block:: python

from filelock import SoftReadWriteLock

rw = SoftReadWriteLock("/shared/nfs/data.lock")

with rw.read_lock():
data = get_shared_data()

with rw.write_lock():
update_shared_data()

The defaults (``heartbeat_interval=30`` s, ``stale_threshold=90`` s, ``poll_interval=0.25`` s) fit workloads
that hold locks for seconds-to-minutes. Tune them for your deployment:

.. code-block:: python

rw = SoftReadWriteLock(
"/shared/nfs/data.lock",
heartbeat_interval=30, # how often to refresh the marker's mtime
stale_threshold=90, # declare a marker stale after this many seconds of no refresh
poll_interval=0.25, # how long to sleep between acquire retries
)

Pick ``stale_threshold`` larger than any realistic pause a holder could experience (GC, disk flush, kernel
preemption). ``heartbeat_interval`` should be roughly ``stale_threshold / 3``; that is the ratio etcd uses for
its ``LeaseKeepAlive``. Lower ``poll_interval`` reduces acquire latency under contention at the cost of more
NFS ``stat`` calls per waiting client.

Writer acquisition is two-phase and writer-preferring: phase one claims the writer marker (which immediately
blocks any new reader), phase two waits for existing readers to drain. This rules out writer starvation even
under a read-heavy workload like the 99/1 reader-to-writer mix typical of slurm job queues.

**Fork caveat.** A process that forks while holding a ``SoftReadWriteLock`` loses the lock in the child. The
inherited instance is marked fork-invalidated; ``release()`` on it becomes a no-op, and the child must call
``SoftReadWriteLock(path)`` again to get a fresh instance before acquiring. Matches the semantics of
:class:`threading.Lock` and PyMongo's connection pools.

**Trust boundary.** The class protects against same-UID non-cooperating processes on one host, cross-host
same-UID processes, and same-host different-UID users (via ``0o600`` / ``0o700`` permissions). It does not
protect against root compromise, NTP tampering on same-UID cross-host nodes, or multi-tenant mounts where
hostile co-tenants share the UID.

***********************************
Use async read / write locks
***********************************
Expand Down Expand Up @@ -350,6 +404,18 @@ Low-level ``acquire_read``/``acquire_write``/``release`` methods are also availa
The same reentrancy and upgrade/downgrade rules as the synchronous :class:`ReadWriteLock <filelock.ReadWriteLock>`
apply — see :ref:`how-to:Use shared read / exclusive write locks` for details.

For network filesystems, use :class:`AsyncSoftReadWriteLock <filelock.AsyncSoftReadWriteLock>`, which wraps
:class:`SoftReadWriteLock <filelock.SoftReadWriteLock>` the same way:

.. code-block:: python

from filelock import AsyncSoftReadWriteLock

rw = AsyncSoftReadWriteLock("/shared/nfs/data.lock")

async with rw.read_lock():
data = await get_shared_data()

**************************************
Detect stale locks (soft locks only)
**************************************
Expand Down
10 changes: 10 additions & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,16 @@ Choose the right lock for your use case:
- ✓ Reentrant per mode
- ✓ Async via AsyncReadWriteLock

.. grid-item-card::
**SoftReadWriteLock**

NFS and HPC-cluster reader/writer lock with TTL-based cross-host stale detection.

- ✓ Works on NFS / Lustre / shared storage
- ✓ Cross-host stale detection via heartbeat
- ✓ Writer-preferring, starvation-free
- ✓ Async via AsyncSoftReadWriteLock

.. grid-item-card::
**AsyncFileLock**

Expand Down
47 changes: 47 additions & 0 deletions docs/tutorials.rst
Original file line number Diff line number Diff line change
Expand Up @@ -214,6 +214,53 @@ Key differences from ``PIDLockFile``:
- Stale lock detection happens automatically on acquire (Unix/macOS only)
- Supports context managers, reentrant locking, timeouts, and all other filelock features

*************************************
Reader/writer locks on a shared NFS
*************************************

If your lock file lives on a network filesystem — a slurm-mounted home directory, a Lustre cluster scratch
space, or any NFS share — use :class:`SoftReadWriteLock <filelock.SoftReadWriteLock>` rather than
:class:`ReadWriteLock <filelock.ReadWriteLock>`. ``ReadWriteLock`` is SQLite-backed and unsafe on NFS.
``SoftReadWriteLock`` is built on :class:`SoftFileLock <filelock.SoftFileLock>` primitives and handles cross-host stale detection via a
background heartbeat thread.

.. code-block:: python

from filelock import SoftReadWriteLock

rw = SoftReadWriteLock("/shared/nfs/work.lock")

with rw.read_lock():
# Any number of processes on any host can be here at the same time.
data = open("/shared/nfs/data.json").read()

with rw.write_lock():
# Exactly one process anywhere can be here. New readers wait behind a pending writer.
open("/shared/nfs/data.json", "w").write(new_data)

While the lock is held, you will see a few sidecar files on disk next to ``work.lock``:

.. code-block:: text

work.lock.state # short-lived state mutex, exists only during transitions
work.lock.write # writer marker, exists while a writer is claiming or holding
work.lock.readers/ # directory with one file per active reader

A daemon heartbeat thread refreshes each marker's ``mtime`` every ``heartbeat_interval`` seconds (default 30).
If a compute node crashes while holding a lock, any other node will evict the stale marker after
``stale_threshold`` seconds of no refresh (default 90, following etcd's ``LeaseKeepAlive`` convention of
``TTL / 3``). Both values are constructor arguments, so HPC deployments that hold locks for hours can raise them:

.. code-block:: python

rw = SoftReadWriteLock(
"/shared/nfs/work.lock",
heartbeat_interval=120,
stale_threshold=360,
)

See :doc:`concepts` for the full explanation of the heartbeat + TTL model.

************
Next steps
************
Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ test = [
"virtualenv>=21.2",
]
type = [
"ty>=0.0.29",
"ty>=0.0.30",
{ include-group = "docs" },
{ include-group = "release" },
{ include-group = "test" },
Expand Down
4 changes: 4 additions & 0 deletions src/filelock/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@
ReadWriteLock = None

from ._soft import SoftFileLock
from ._soft_rw import AsyncAcquireSoftReadWriteReturnProxy, AsyncSoftReadWriteLock, SoftReadWriteLock
from ._unix import UnixFileLock, has_fcntl
from ._windows import WindowsFileLock
from .asyncio import (
Expand Down Expand Up @@ -72,16 +73,19 @@
"AcquireReturnProxy",
"AsyncAcquireReadWriteReturnProxy",
"AsyncAcquireReturnProxy",
"AsyncAcquireSoftReadWriteReturnProxy",
"AsyncFileLock",
"AsyncReadWriteLock",
"AsyncSoftFileLock",
"AsyncSoftReadWriteLock",
"AsyncUnixFileLock",
"AsyncWindowsFileLock",
"BaseAsyncFileLock",
"BaseFileLock",
"FileLock",
"ReadWriteLock",
"SoftFileLock",
"SoftReadWriteLock",
"Timeout",
"UnixFileLock",
"WindowsFileLock",
Expand Down
Loading
Loading