fix(engine): enforce runtime exclusion for global [NotInParallel] (#5800)#5805
fix(engine): enforce runtime exclusion for global [NotInParallel] (#5800)#5805
Conversation
) Phase ordering in TestScheduler drained the global NotInParallel bucket last, but a Parallel-bucket test with [DependsOn(NIPTest)] could surface the NIP test mid–parallel-phase via dependency recursion in TestRunner — running it alongside its siblings and violating the documented "completely alone" contract. Add NotInParallelLock, an async writer-prefer reader/writer primitive. Tests acquire shared (or exclusive for global NIP, no keys, no ParallelGroup) after dependency recursion completes — avoiding the self-deadlock a NIP→Parallel dependency chain would otherwise hit.
Not up to standards ⛔🔴 Issues
|
| Category | Results |
|---|---|
| Performance | 1 medium |
🟢 Metrics 19 complexity
Metric Results Complexity 19
NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.
There was a problem hiding this comment.
Code Review
This is a well-motivated fix for a genuinely tricky bug — dependency recursion bypassing phase-based scheduling is subtle, and the PR description clearly explains the failure path. The test design (rendezvous + polling) is clever and deterministic. Overall the approach is sound; a few things are worth flagging.
Bug: NotInParallelLock leaks SemaphoreSlim
NotInParallelLock holds _writerGate : SemaphoreSlim which implements IDisposable, but NotInParallelLock itself never disposes it. Register(new NotInParallelLock()) in TUnitServiceProvider doesn't call Dispose() unless the type is also registered with a cleanup contract.
// NotInParallelLock.cs — should implement IDisposable
internal sealed class NotInParallelLock : IDisposable
{
private readonly SemaphoreSlim _writerGate = new(1, 1);
...
public void Dispose() => _writerGate.Dispose();
}SemaphoreSlim holds a kernel WaitHandle on first contested wait — so in a large test run with any NIP tests the handle leaks until the process exits. Low severity in practice but easy to fix.
Correctness: EnterExclusiveAsync early-return path is correct but non-obvious
When _activeReaders == 0, the method returns while still holding _writerGate. This is intentional (the "hold" IS the gate), but the silent early return inside the lock with no comment is surprising. Future readers may assume this is a missing _writerGate.Release(). A brief comment would help:
if (_activeReaders == 0)
{
// Already drained; hold _writerGate to block new readers until ExitExclusive.
return;
}Performance concern: all tests pay the shared-lock cost
Every test that runs — including the vast majority of plain parallel tests — now does:
await _writerGate.WaitAsync(ct); // semaphore acquire
lock (_stateLock) { _activeReaders++; }
_writerGate.Release(); // semaphore release
// ... run test ...
lock (_stateLock) { _activeReaders--; ... }In the common steady-state case (no NIP test queued) these are uncontested and cheap (~nanoseconds each). But with thousands of parallel tests firing simultaneously, the _writerGate semaphore becomes a serialization point for the shared-entry path — all threads still funnel through it one-at-a-time to bump _activeReaders, even when no writer is waiting.
A common pattern to avoid this is a two-phase check: skip the gate when there's no writer:
// Fast path — no writer pending; just atomically bump the counter.
// Re-check under the gate if a writer races in.
private static int _writerWaiting;
private async Task EnterSharedAsync(CancellationToken ct)
{
if (Volatile.Read(ref _writerWaiting) == 0)
{
Interlocked.Increment(ref _activeReaders);
// Recheck after increment; if a writer slipped in, yield to it.
if (Volatile.Read(ref _writerWaiting) == 0) return;
Interlocked.Decrement(ref _activeReaders);
// Fall through to gated path.
}
await _writerGate.WaitAsync(ct).ConfigureAwait(false);
...
}Whether this matters in practice depends on the test-suite size, but TUnit's CLAUDE.md calls out "minimize allocations in hot paths / cache reflection" as a first-class concern, and ExecuteTestInternalAsync is exactly a hot path.
Minor: RequiresGlobalExclusion iterates constraints on every invocation
The method walks test.Context.ParallelConstraints at execution time for every test. For tests with many constraints this is cheap, but the result is purely a function of the test's metadata and could be computed once during test registration (e.g., as a bool stored on the AbstractExecutableTest). Not a correctness issue, just consistent with the project's performance posture.
Test design observation (not a bug)
Repro5800 uses static fields (_activeSiblings, _maxSiblingsObservedDuringLonely, SiblingLive) which are fine because the engine test runs the repro in a subprocess. Worth a short comment explaining why static is safe here, since the pattern is otherwise unusual for test code and could confuse future readers.
Summary
The core fix is correct and well-tested. The IDisposable issue is the only concrete defect; the rest are polish/performance points. The NotInParallelLock implementation's exclusion semantics (writer holds the gate while draining, blocking new readers — true writer-prefer) are correct, and the decision to acquire the lock after dependency recursion to avoid deadlock is exactly right.
Address Claude review on #5805: - NotInParallelLock implements IDisposable so TUnitServiceProvider's Dispose chain releases the SemaphoreSlim's WaitHandle. - Comment the EnterExclusiveAsync early-return: returning while still holding _writerGate is intentional (the gate IS the exclusion).
There was a problem hiding this comment.
Follow-up Review (commit 9aa8443)
The two issues flagged in the previous review have been addressed. Confirming resolution and noting what remains open.
✅ Resolved: NotInParallelLock IDisposable leak
NotInParallelLock now implements IDisposable and correctly disposes the SemaphoreSlim. Good fix.
✅ Resolved: Non-obvious early-return in EnterExclusiveAsync
The comment now makes the intent clear — returning while holding _writerGate is intentional. This is the right fix.
Still open: Shared-entry serialization point for all parallel tests
Every test execution — including the vast majority of plain parallel tests — still goes through:
await _writerGate.WaitAsync(ct); // acquire semaphore
lock (_stateLock) { _activeReaders++; }
_writerGate.Release(); // release semaphoreIn the common steady-state case (no NIP test running), this semaphore is uncontested so it's cheap — but all concurrent test threads still funnel through it one-at-a-time to bump _activeReaders. In a large suite with hundreds of parallel tests launching simultaneously, this is a hidden serialization point in ExecuteTestInternalAsync.
A standard approach is to use an atomic counter with a fast path that skips the gate entirely when no writer is waiting:
private int _writerPending;
private async Task EnterSharedAsync(CancellationToken ct)
{
// Fast path: no writer pending, just atomically bump the reader count.
// The double-check after increment handles the race where a writer
// arrives between the Volatile.Read and the Interlocked.Increment.
if (Volatile.Read(ref _writerPending) == 0)
{
Interlocked.Increment(ref _activeReaders);
if (Volatile.Read(ref _writerPending) == 0)
return; // fast path succeeded
// A writer raced in — undo and fall through to gated path.
ExitShared();
}
await _writerGate.WaitAsync(ct).ConfigureAwait(false);
try { lock (_stateLock) { _activeReaders++; } }
finally { _writerGate.Release(); }
}This makes NIP-free runs O(1) lock-free at the cost of an extra volatile read per test. Worth considering given CLAUDE.md's "minimize allocations in hot paths" mandate and the fact that ExecuteTestInternalAsync is the hot path.
Still open: RequiresGlobalExclusion recalculated per execution
The constraint scan in RequiresGlobalExclusion is a function purely of static test metadata, yet it runs on every execution. Caching the result as a bool in test metadata (computed once at registration) would be more consistent with the project's performance posture. Not a correctness issue.
Still open: Static fields in Repro5800 without a safety comment
_activeSiblings, _maxSiblingsObservedDuringLonely, and SiblingLive are static, which is safe only because the engine test runs the repro in a subprocess. That invariant isn't obvious from the test code itself, and someone copying the pattern into a non-subprocess context would get test-to-test pollution. A one-liner would prevent that:
// Static fields are safe: the engine test runs Repro5800 in a subprocess, isolating all state.
private static int _activeSiblings;Summary
Two of the five original issues are resolved. The core fix remains correct and the lock semantics are sound. The outstanding items are a performance concern (shared-entry serialization), a minor caching opportunity, and a missing test comment — none are correctness bugs. The PR is in good shape; whether to address the remaining points before merge is a call for the author.
Suites without any global [NotInParallel] tests now skip the runtime exclusion lock entirely — Enter/Exit short-circuit when _enabled is false. TestScheduler flips the flag via Enable() the moment grouping reports a global NIP test (initial set or dynamic batch); once on it stays on, so semantics are unchanged for any suite that ever sees one. Also: comment why static fields in Repro5800 are safe (subprocess).
There was a problem hiding this comment.
Follow-up Review (commit 9bb38a1)
Third pass. Tracking all five original issues and the new commit.
✅ Resolved: NotInParallelLock IDisposable leak (commit 9aa8443)
Confirmed in previous review — SemaphoreSlim is now disposed correctly.
✅ Resolved: Non-obvious early-return in EnterExclusiveAsync (commit 9aa8443)
The comment makes the intent clear. Confirmed in previous review.
✅ Resolved: Static fields in Repro5800 without a safety comment
The comment is now present:
// Static fields are safe here: the engine test invokes Repro5800 in a
// dedicated subprocess, so there is no cross-run state contamination.
private static int _activeSiblings;This was flagged as "still open" in the second review; it has since been addressed.
✅ Partially resolved: Shared-entry serialization for all parallel tests (commit 9bb38a1)
The new _enabled flag short-circuits EnterAsync entirely when no global NIP tests are present in the suite:
if (!_enabled)
return new ValueTask<Scope>(default(Scope));This makes the common case (NIP-free suites) truly zero-cost — no semaphore touched at all. That is the right call and addresses the most impactful scenario.
The residual concern is for suites that do contain at least one global NIP test. In that case, all concurrent parallel tests still funnel through _writerGate.WaitAsync + lock (_stateLock) to bump _activeReaders. With hundreds of parallel tests launching simultaneously this is a serialization point in ExecuteTestInternalAsync. That said, the impact is much narrower than before — it only affects runs where someone has opted into global NIP, which is the exception rather than the rule, and the semaphore is uncontested in steady state so cost is dominated by thread-scheduling overhead rather than actual contention. The atomic fast-path approach suggested in previous reviews would eliminate this entirely if it matters in practice.
Whether to address it is a judgement call; it is not a correctness bug.
Still open: RequiresGlobalExclusion recalculated per execution
The constraint scan still happens on every call to ExecuteTestInternalAsync. For a suite with 5 000 tests it executes 5 000 times on data that is fixed at registration time. Caching a bool on AbstractExecutableTest (computed once during registration) would be more consistent with the project's performance posture and remove the per-test allocation of the switch iteration. Minor, but easy to address if you are already touching the metadata path.
Summary
All correctness and readability issues from the previous two reviews are now resolved. The remaining items are:
| # | Issue | Severity |
|---|---|---|
| 1 | Shared-entry serialization in NIP-containing suites | Minor perf (narrowed by _enabled gate) |
| 2 | RequiresGlobalExclusion scans constraints on every invocation |
Micro-perf |
The core fix is correct, the lock semantics are sound, the test coverage is thorough, and the _enabled optimization handles the dominant common case at zero cost. This is ready to merge — the two remaining points are polish, not blockers.
TUnit0032 (DependsOnNotInParallelConflict) flagged [NotInParallel] + [DependsOn] on the same method as an error, with the rationale "tests with dependencies must support parallel execution". That rationale was a workaround for the underlying bug fixed by NotInParallelLock: dependency recursion now resolves the dep first, then the NIP test acquires exclusive — combining the two attributes is correct and the analyzer error is over-restrictive. Also: clarify the stale TestScheduler comment about NIP "drains last (must run alone)" — phase ordering preserves declared [NotInParallel(Order)]; the alone semantic is enforced by NotInParallelLock at runtime.
There was a problem hiding this comment.
Follow-up Review (commit 90825cb)
Fourth pass. The latest commit removes TUnit0032 and clarifies the TestScheduler comment. Tracking all prior issues.
✅ Resolved: NotInParallelLock IDisposable leak (commit 9aa8443)
✅ Resolved: Non-obvious early-return in EnterExclusiveAsync (commit 9aa8443)
✅ Resolved: Static fields in Repro5800 without a safety comment (commit 9bb38a1)
✅ Resolved: NIP-free suites pay zero cost (commit 9bb38a1)
All four issues from prior reviews confirmed resolved.
✅ New: TUnit0032 removal is correct
The latest commit removes DependsOnNotInParallelConflictAnalyzer (TUnit0032), which flagged [NotInParallel] + [DependsOn] on the same method as an error. That error was a workaround for the underlying scheduler bug — now that NotInParallelLock correctly enforces runtime exclusion, the combination is safe and the analyzer was over-restrictive.
The cleanup is thorough: the class, its test, Rules.cs, Resources.resx, Resources.Designer.cs, and the AnalyzerReleases.Unshipped.md entry are all removed consistently. The unshipped.md note correctly documents the removal rationale.
Still open: Shared-entry serialization in NIP-containing suites
When a suite contains at least one global [NotInParallel] test, all concurrent parallel tests still funnel through:
await _writerGate.WaitAsync(ct); // uncontested but serializing
lock (_stateLock) { _activeReaders++; }
_writerGate.Release();The _enabled flag (commit 9bb38a1) makes NIP-free suites zero-cost, which is the dominant case. For suites with NIP, the per-test serialization through an uncontested semaphore is low-overhead in practice (nanoseconds per acquire), but it does mean concurrent test threads serialize on entry even when no writer is active. An atomic fast-path (volatile reader-count with lock-free increment when no writer is pending) would eliminate this entirely — but this is a micro-optimization for an already-narrow case. Not a blocker.
Still open: RequiresGlobalExclusion recalculated per execution
The constraint scan in TestRunner.RequiresGlobalExclusion is still a function of fixed test metadata, called on every ExecuteTestInternalAsync. Caching a bool on AbstractExecutableTest at registration time would remove the per-test iteration. Minor, and consistent with the project's performance posture — but not a correctness issue.
Summary
This PR is in excellent shape. All correctness and readability issues have been addressed across four commits. The TUnit0032 removal is a clean, well-reasoned change that follows through on the fix's implications for the analyzer layer.
The two remaining open items are polish-level micro-optimizations, not blockers. The core fix is correct, the lock semantics are sound, the test coverage is thorough, and the _enabled short-circuit handles the common case at zero cost. Ready to merge.
- Repro5800: drop SiblingLive rendezvous. Under the new exclusion lock, if LonelyTest acquires the writer gate before any sibling acquires shared, siblings are blocked and never set SiblingLive — causing the 60s wait to time out on slow runners (Windows CI). Replace with a body-window sample of _activeSiblings: with the fix the count stays zero whichever side wins the dispatch race; without the fix, sibling activity overlaps and the assertion still fails as designed. - Cache RequiresGlobalExclusion as a bool on AbstractExecutableTest, set once at grouping time. TestRunner now reads the flag instead of scanning ParallelConstraints per-execution. Removes a hot-path scan. - NotInParallelLock: replace catch/throw cleanup with a try/finally drained-flag pattern. Same semantics, no general catch clause.
There was a problem hiding this comment.
Follow-up Review (commit d58ab27)
Fifth pass. The latest commit addresses both remaining items from the fourth review.
✅ Resolved: RequiresGlobalExclusion recalculated per execution (commit d58ab27)
RequiresGlobalExclusion has been removed from the hot path. RequiresGlobalNotInParallelLock is now a cached bool on AbstractExecutableTest, set once in MarkGlobalNotInParallelTests and read directly in TestRunner.ExecuteTestInternalAsync. This is the right design — fixed metadata belongs on the model, not in a per-invocation scan.
✅ Resolved: Repro5800 CI flakiness on slow runners (commit d58ab27)
The SiblingLive rendezvous (which caused 60s timeouts on Windows CI when LonelyTest acquired the writer gate before any sibling set the flag) has been replaced with a body-window polling approach. The assertion now samples _activeSiblings across 50 × 20ms windows — deterministically zero with the fix regardless of dispatch order, non-zero without it. This is a robust design choice.
✅ Resolved: NotInParallelLock catch/throw cleanup (commit d58ab27)
The try/finally drained-flag pattern is cleaner: drained only becomes true on a successful await, so cancellation and exceptions both fall through to the cleanup branch naturally. Good change.
Still open: Shared-entry serialization in NIP-containing suites
When the suite contains at least one global [NotInParallel] test (_enabled == true), all parallel tests still funnel through _writerGate.WaitAsync + lock (_stateLock) { _activeReaders++; } on entry. The uncontested semaphore path is fast in practice, and this case is rare (most suites have no global NIP tests and pay zero cost). An atomic fast-path (Interlocked counter with volatile writer-pending flag) would eliminate the serialization point entirely, but this is a micro-optimization for an already-narrow scenario. Not a blocker.
New observation: Dynamic-batch / enable-ordering edge case
If the initial test batch contains no global NIP tests, _enabled stays false and parallel tests take the zero-cost short-circuit. If a subsequent dynamic batch then introduces NIP tests, MarkGlobalNotInParallelTests calls Enable() — but parallel tests already in flight never acquired shared scopes, so _activeReaders == 0 when the NIP test's exclusive acquire runs, and it proceeds without waiting. Concurrent in-flight parallel tests are invisible to the lock.
This is a pre-existing edge case in how dynamic tests interact with phase ordering; the PR does not claim to fix it, and it requires a specific sequence (dynamic NIP tests discovered while initial parallel tests are still running). Worth noting for a follow-up, but not introduced by this PR and not a blocker.
Summary
All five issues from the prior four reviews are now resolved. The latest commit closes the two that were still open as of the fourth pass. The implementation is correct, the lock semantics are sound, the test coverage is thorough, and the _enabled short-circuit handles the dominant NIP-free case at zero cost. This is ready to merge.
Summary
[DependsOn(NIPTest)]was triggering a global[NotInParallel]test mid–parallel-phase via dependency recursion inTestRunner, where it ran alongside parallel siblings — violating the documented "completely alone" contract.NotInParallelLock, an async writer-prefer reader/writer primitive. Tests acquire shared (or exclusive for global NIP — no keys, noParallelGroup) after dependency recursion, so a global-NIP test that depends on a Parallel test does not deadlock against its own dependency.TestScheduleris unchanged; the lock backstops the cross-phase case.Why phase ordering wasn't enough
TestSchedulerdrained the globalNotInParallelbucket last, sequentially. ButTestRunner.ExecuteTestInternalAsyncresolves[DependsOn]via in-process recursion (await ExecuteTestAsync(dep, ct)) before acquiring any limiter. A dependent in the Parallel bucket therefore pulled the NIP test in early, alongside other Parallel siblings — counter races, "Expected 1 but found 3", etc. The existingDependsOnNotInParallelConflictAnalyzeronly flags when the same method has both attributes, so the cross-test relationship slipped through.Test plan
Repro5800(TUnit.TestProject/Bugs/5800/Repro.cs): deterministic rendezvous —LonelyTest [NotInParallel]waits for a sibling to be live, then polls_activeSiblingsand asserts it stays zero throughout its body.DependentOfLonely [DependsOn(LonelyTest)]exercises the dep-recursion path.GlobalNotInParallelDependsOnTests(TUnit.Engine.Tests/Scheduling/): runs the filter end-to-end — 7/7 pass.LonelyTestfails, dependent skipped — confirms the repro catches the bug.NotInParallelExecutionTests,NotInParallelOrderExecutionTests,NotInParallelClassGroupingTests,CombinedConstraintsSelfContainedTest,CombinedParallelConstraintsTests,Repro5700/CrossKeyOverlap5700,DependsOnAndNotInParallelTests,NotInParallelWithDependsOnTests: 31/31 pass.