feat(runtime): file-lock the TRT-RTX runtime cache for cross-runtime safety by tp5uiuc · Pull Request #1 · tp5uiuc/TensorRT

tp5uiuc · 2026-04-28T07:06:57Z

Description

Adds a cross-platform RAII file-lock primitive (core/util/file_lock.{h,cpp}) so the Python and C++ TRT-RTX runtimes sharing a runtime_cache_path cannot race the rename and silently drop compiled kernels. Wires it into load_runtime_cache (shared lock) and save_runtime_cache_impl (exclusive lock) in core/runtime/TRTRuntimeConfig.cpp.

This is the follow-up to a reviewer comment on the parent C++-runtime port PR, which intentionally landed without locking and asked for a separate "platform-independent file-locking" pass.

Backend choices

The Python library is filelock (tox-dev/filelock; imported as from filelock import FileLock). Source-verified behavior:

Linux/macOS: fcntl.flock(fd, LOCK_EX | LOCK_NB) — BSD flock(2), NOT POSIX fcntl(F_SETLK) record locks.
Windows: msvcrt.locking(fd, LK_NBLCK, 1) — wraps Win32 LockFile, locks 1 byte at offset 0.

For the C++ side to interoperate with filelock on the same <cache>.lock file, the lock-file convention has to match exactly:

Unix backend MUST use flock(2), not fcntl(F_SETLK) — on Linux the two primitives live in independent namespaces, so fcntl would silently fail to interop.
Windows backend uses LockFileEx on byte range (0, 1) — locking the whole file would not conflict with the Python side's 1-byte lock.
Lock file: <cache_path>.lock next to the cached artifact (matches filelock's default).

API shape

Single class with explicit lock / try_lock / try_lock_for(timeout). Move-only, RAII destructor. Mode { Shared, Exclusive } — load takes shared, save takes exclusive. flock(2) has no native timeout, so try_lock_for is a 50ms-cadence poll loop. 10s default timeout matches the Python side's acquire(timeout=10).

Errors propagate via TORCHTRT_CHECK; the existing try/catch in ensure_initialized and the noexcept save_runtime_cache wrapper catch and log them, so external behavior on contention is unchanged.

Type of change

New feature (non-breaking change which adds functionality)

Tests

C++ unit tests (tests/cpp/test_file_lock.cpp): 12 cases covering ctor, exclusive/shared contention, mixed-mode contention, timeout edges, RAII release, move semantics, no-unlink-on-release, open-failure throw, and a same-namespace flock(2) interop check that verifies the C++ primitive conflicts with raw flock locks (the operation filelock uses).

Python E2E (tests/py/dynamo/runtime/test_000_runtime_cache.py):

Parameterizes test_filelock_works and test_sequential_save_load over both runtimes.
New test_python_lock_blocks_cpp_save: an externally-held filelock causes the C++ save to time out silently (the noexcept member catches), the cache is unmodified while the lock is held, and a fresh save after release succeeds.
New test_filelock_cross_runtime_parallel: two subprocesses (one Python-runtime, one C++-runtime) compile against a shared runtime_cache_path and both succeed without deadlock or corruption. Subprocesses rather than threads because torch.export has thread-unsafe TLS, but cross-process is the real-world locking scenario anyway.

Local runs (RTX, A100):

bazel test //tests/cpp:test_file_lock: 12 passed, 353ms total
pytest tests/py/dynamo/runtime/test_000_runtime_cache.py: 21 passed, 2 skipped (non-RTX), 0 failed

Note on base branch

This PR is opened against feat/trtrtx-cpp-runtime rather than main because the parent C++-runtime port has not landed yet. Once that merges, this branch will be rebased onto main before going to upstream review.

Checklist

My code follows the style guidelines of this project (pre-commit clean: clang-format, black, isort, ruff, buildifier, typos, NVIDIA-INTERNAL all pass)
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas and hacks
I have made corresponding changes to the documentation (no public-facing API change)
I have added tests to verify my fix or my feature
New and existing unit tests pass locally with my changes
I have added the relevant labels to my PR in so that relevant reviewers are notified

Commits:

feat(runtime): file-lock the TRT-RTX runtime cache for cross-runtime safety — the feature.
style(bazel): buildifier sweep on touched BUILD files — drive-by buildifier reformatting on pre-existing BUILD code, kept separate so the functional diff is easy to review.
docs(file_lock): correct package name and rename "wire protocol" to "lock-file convention" — comment-only fix: the Python library is filelock (tox-dev/filelock), not py-filelock; reword "wire protocol" to "lock-file convention".

…safety Adds a cross-platform RAII file-lock primitive (core/util/file_lock.{h,cpp}) matching py-filelock's lock-file convention so the Python and C++ runtimes sharing a runtime_cache_path do not race the rename and silently drop compiled kernels. - Unix backend uses BSD flock(2) -- the primitive py-filelock uses, not POSIX fcntl record locks (which live in an independent namespace and would silently fail to interop on Linux). - Windows backend uses LockFileEx on byte (0,1) -- matches the byte range msvcrt.locking(..., 1) locks on the Python side. - Platform branch is hidden behind a LockHandle struct with move-and-swap semantics, so callers only see a single FileLock RAII type. - Shared/exclusive modes: load takes shared (multiple readers OK), save takes exclusive. Python's FileLock is exclusive-only but conflicts correctly against C++ shared holders since both use the flock namespace. - 10s acquire timeout via 50ms-cadence poll loop, matching the Python side's timeout=10. Lock-file path is <cache_path>.lock. Wired into load_runtime_cache and save_runtime_cache_impl, with the FileLock scoped to just the I/O block (save writes in-place under the lock, no tmp+rename). Errors propagate via TORCHTRT_CHECK; the existing try/catch in ensure_initialized and the noexcept save_runtime_cache wrapper catch and log, so external behavior on contention is unchanged. Tests: - tests/cpp/test_file_lock.cpp: 12 unit tests covering exclusive/shared contention, timeout edges, RAII release, move semantics, no-unlink-on- release, and a same-namespace flock(2) interop check that verifies the C++ primitive conflicts with raw flock locks (what py-filelock uses). - tests/py/dynamo/runtime/test_000_runtime_cache.py: - parameterizes test_filelock_works and test_sequential_save_load over both runtimes - test_python_lock_blocks_cpp_save: an externally-held py-filelock causes the C++ save to time out silently, leaving the cache file unmodified; a fresh save after release succeeds - test_filelock_cross_runtime_parallel: two subprocesses (one Python- runtime, one C++-runtime) compile against a shared cache_path and both succeed. Subprocesses rather than threads because torch.export has thread-unsafe TLS, but cross-process is the real-world locking scenario anyway.