Fix MaxExternalSpansPerTest cap bypass when Activity.Parent chain is broken#5352
Fix MaxExternalSpansPerTest cap bypass when Activity.Parent chain is broken#5352
Conversation
Up to standards ✅🟢 Issues
|
| Metric | Results |
|---|---|
| Complexity | 8 |
TIP This summary will be updated as you push new changes. Give us feedback
…broken When libraries like Npgsql use async connection pooling, Activity.Parent may be null/broken by the time OnActivityStopped fires, causing FindTestCaseAncestor to return null and external spans to bypass the cap. Three-layer fix: 1. Register test case span IDs on ActivityStarted (before children stop) 2. Fallback in FindTestCaseAncestor to check ParentSpanId against known test case span IDs when Activity.Parent walk fails 3. Per-trace cap as ultimate safety net for completely broken chains Agent-Logs-Url: https://github.com/thomhurst/TUnit/sessions/1c7f9e0c-eca2-4452-a1b9-b94f99ac238c Co-authored-by: thomhurst <30480171+thomhurst@users.noreply.github.com>
There was a problem hiding this comment.
Code Review
This is a well-reasoned fix for a real-world bug (Npgsql async pooling breaking Activity.Parent chains) with clear documentation. The three-layer defense strategy is sound. A few issues worth addressing:
Issue 1: Memory leak in _testCaseSpanIds (and _externalSpanCountsByTrace)
_testCaseSpanIds is populated in OnActivityStarted but never cleaned up. For large test runs (hundreds/thousands of tests), this dictionary accumulates span ID strings indefinitely for the lifetime of the collector.
The fix is straightforward — since ActivityStopped fires for parents after all children have stopped, it's safe to remove the entry when the test case span stops:
private void OnActivityStopped(Activity activity)
{
// ... existing logic ...
// Cleanup: remove test case span from tracking set once it stops.
// All child spans will have already stopped by this point.
if (isTUnit && activity.GetTagItem("tunit.test.node_uid") is not null)
{
_testCaseSpanIds.TryRemove(activity.SpanId.ToString(), out _);
// Also clean up the trace-level fallback counter
_externalSpanCountsByTrace.TryRemove(activity.TraceId.ToString(), out _);
}
}The same applies to _externalSpanCountsByTrace — it grows forever too. Note that the existing _externalSpanCountsByTest has the same issue, so this is a pre-existing pattern, but adding a third unbounded dictionary amplifies the concern.
Issue 2: MaxExternalSpansPerTest reused for a per-trace concept
The constant MaxExternalSpansPerTest is semantically "per test" but is reused as the per-trace fallback cap:
var count = _externalSpanCountsByTrace.AddOrUpdate(traceId, 1, (_, c) => c + 1);
if (count > MaxExternalSpansPerTest) // ← misleading name hereConsider a dedicated constant or renaming:
private const int MaxExternalSpansPerTrace = 100; // fallback cap when test association is unknownThis makes the intent clearer and allows tuning them independently if needed.
Issue 3: Fallback only resolves one level, not the full ancestor chain
The new ParentSpanId fallback checks only if the direct parent is a known test case span. If a span's chain is broken two levels up (e.g., span → middleware span → test case, where both parent references are null), it won't be associated with the test and falls into the trace-level cap.
This is likely acceptable for the Npgsql case (which breaks the immediate parent reference), but the comment should note this limitation explicitly:
// Fallback: check if the direct ParentSpanId is a known test case span.
// Note: only one level — deeper broken chains fall through to the per-trace cap.
// This handles Npgsql async pooling where the direct parent reference is broken
// but W3C ParentSpanId is still correct.Issue 4: Double-counting risk allows 2× the intended max per trace
Spans in a trace can be counted in both _externalSpanCountsByTest (those that resolve) AND _externalSpanCountsByTrace (those that don't). Both caps are independent with the same limit. So a single trace could accept up to 2 × MaxExternalSpansPerTest = 200 external spans — potentially still enough to cause report bloat for extreme cases.
This is minor for now, but if the goal is a hard cap per trace, consider a combined strategy.
Overall
The fix correctly addresses the reported crash and is well-documented. The three-layer approach is a pragmatic defensive strategy. The main concerns are the memory leak (Issue 1 — should be fixed) and naming clarity (Issue 2 — easy win). Issues 3 and 4 are informational.
|
@copilot read the latest review on the pr |
…t, clarify fallback comment - Clean up _testCaseSpanIds and _externalSpanCountsByTrace when test case span stops (children stop before parents, so cleanup is safe) - Add separate MaxExternalSpansPerTrace constant for the per-trace fallback cap instead of reusing MaxExternalSpansPerTest - Update FindTestCaseAncestor comment to note single-level limitation Agent-Logs-Url: https://github.com/thomhurst/TUnit/sessions/e846703a-5b5a-4f60-afc3-cffadcc724c5 Co-authored-by: thomhurst <30480171+thomhurst@users.noreply.github.com>
Addressed all three actionable issues from the review in 947392d:
|
There was a problem hiding this comment.
Follow-up Review (after 947392d)
The previous review identified four issues. The Copilot agent has addressed Issues 1, 2, and 3 in commit 947392d. Here is the updated assessment:
Issue 1 (Memory leak) — RESOLVED
_testCaseSpanIds and _externalSpanCountsByTrace are now cleaned up in OnActivityStopped when the test case span stops. The ordering guarantee (children stop before parents) makes this safe.
Issue 2 (Naming) — RESOLVED
MaxExternalSpansPerTrace is now a dedicated constant, separate from MaxExternalSpansPerTest. The intent is now clear and the two limits can be tuned independently.
Issue 3 (Comment clarity) — RESOLVED
The FindTestCaseAncestor fallback comment now explicitly notes the single-level limitation.
Issue 4 (Double-counting / combined trace cap) — Still informational
This remains unaddressed, but is acceptable for now. A single trace could theoretically accept up to MaxExternalSpansPerTest + MaxExternalSpansPerTrace = 200 external spans in the pathological case where some spans resolve correctly and some do not. For the Npgsql scenario described, this is unlikely to cause practical problems, but worth keeping in mind if report bloat resurfaces.
One new observation: _externalSpanCountsByTest has the same pre-existing leak
The cleanup added in 947392d removes entries from _externalSpanCountsByTrace when the test case span stops — but _externalSpanCountsByTest (which existed before this PR) receives the same treatment via the TryRemove call. Wait — actually it does NOT. Looking at the cleanup block:
if (isTUnit && activity.GetTagItem("tunit.test.node_uid") is not null)
{
_testCaseSpanIds.TryRemove(activity.SpanId.ToString(), out _);
_externalSpanCountsByTrace.TryRemove(traceId, out _);
}_externalSpanCountsByTest is not cleaned up here (and was not cleaned up before this PR either). For a long test run, this dictionary accumulates one entry per completed test and is never pruned. This is a pre-existing issue, but it would be a low-hanging improvement to also add _externalSpanCountsByTest.TryRemove(activity.SpanId.ToString(), out _) to the same cleanup block, since the test case span ID is available at that point.
Overall
The substantive concerns from the first review are resolved. The fix is well-reasoned, well-documented, and now includes proper cleanup. The remaining items (Issue 4 and the _externalSpanCountsByTest pre-existing leak) are minor and non-blocking. This is ready to merge.
|
@copilot read the newest review |
…re-existing leak Agent-Logs-Url: https://github.com/thomhurst/TUnit/sessions/99eadeb1-f26a-4f94-8341-dc05e1ad3fe9 Co-authored-by: thomhurst <30480171+thomhurst@users.noreply.github.com>
The follow-up review confirmed all substantive issues are resolved. It also noted a pre-existing leak in |
Updated [TUnit.Core](https://github.com/thomhurst/TUnit) from 1.23.7 to 1.28.7. <details> <summary>Release notes</summary> _Sourced from [TUnit.Core's releases](https://github.com/thomhurst/TUnit/releases)._ ## 1.28.7 <!-- Release notes generated using configuration in .github/release.yml at v1.28.7 --> ## What's Changed ### Other Changes * fix: prevent StringBuilder race in console interceptor during parallel tests by @thomhurst in thomhurst/TUnit#5414 ### Dependencies * chore(deps): update tunit to 1.28.5 by @thomhurst in thomhurst/TUnit#5415 **Full Changelog**: thomhurst/TUnit@v1.28.5...v1.28.7 ## 1.28.5 <!-- Release notes generated using configuration in .github/release.yml at v1.28.5 --> ## What's Changed ### Other Changes * perf: eliminate redundant builds in CI pipeline by @thomhurst in thomhurst/TUnit#5405 * perf: eliminate store.ToArray() allocation on mock behavior execution hot path by @thomhurst in thomhurst/TUnit#5409 * fix: omit non-class/struct constraints on explicit interface mock implementations by @thomhurst in thomhurst/TUnit#5413 ### Dependencies * chore(deps): update tunit to 1.28.0 by @thomhurst in thomhurst/TUnit#5406 **Full Changelog**: thomhurst/TUnit@v1.28.0...v1.28.5 ## 1.28.0 <!-- Release notes generated using configuration in .github/release.yml at v1.28.0 --> ## What's Changed ### Other Changes * fix: resolve build warnings in solution by @thomhurst in thomhurst/TUnit#5386 * Perf: Optimize MockEngine hot paths (~30-42% faster) by @thomhurst in thomhurst/TUnit#5391 * Move Playwright install into pipeline module by @thomhurst in thomhurst/TUnit#5390 * perf: optimize solution build performance by @thomhurst in thomhurst/TUnit#5393 * perf: defer per-class JIT via lazy test registration + parallel resolution by @thomhurst in thomhurst/TUnit#5395 * Perf: Generate typed HandleCall<T1,...> overloads to eliminate argument boxing by @thomhurst in thomhurst/TUnit#5399 * perf: filter generated attributes to TUnit-related types only by @thomhurst in thomhurst/TUnit#5402 * fix: generate valid mock class names for generic interfaces with non-built-in type args by @thomhurst in thomhurst/TUnit#5404 ### Dependencies * chore(deps): update tunit to 1.27.0 by @thomhurst in thomhurst/TUnit#5392 * chore(deps): update dependency path-to-regexp to v8 by @thomhurst in thomhurst/TUnit#5378 **Full Changelog**: thomhurst/TUnit@v1.27.0...v1.28.0 ## 1.27.0 <!-- Release notes generated using configuration in .github/release.yml at v1.27.0 --> ## What's Changed ### Other Changes * Fix Dependabot security vulnerabilities in docs site by @thomhurst in thomhurst/TUnit#5372 * fix: use 0.0.0-scrubbed sentinel version in snapshot scrubber to avoid false Dependabot alerts by @thomhurst in thomhurst/TUnit#5374 * Speed up Engine.Tests by removing ProcessorCount parallelism cap by @thomhurst in thomhurst/TUnit#5379 * ci: add concurrency groups to cancel redundant workflow runs by @thomhurst in thomhurst/TUnit#5373 * Add scope-aware initialization and disposal OpenTelemetry spans to trace timeline and HTML report by @Copilot in thomhurst/TUnit#5339 * Add WithInnerExceptions() for fluent AggregateException assertion chaining by @thomhurst in thomhurst/TUnit#5380 * Drop net6.0 and net7.0 TFMs, keep net8.0+ and netstandard2.x by @thomhurst in thomhurst/TUnit#5387 * Remove all [Obsolete] members and migrate callers by @thomhurst in thomhurst/TUnit#5384 * Add AssertionResult.Failed overload that accepts an Exception by @thomhurst in thomhurst/TUnit#5388 ### Dependencies * chore(deps): update dependency mockolate to 2.3.0 by @thomhurst in thomhurst/TUnit#5370 * chore(deps): update tunit to 1.25.0 by @thomhurst in thomhurst/TUnit#5371 * chore(deps): update dependency minimatch to v9.0.9 by @thomhurst in thomhurst/TUnit#5375 * chore(deps): update dependency path-to-regexp to v0.2.5 by @thomhurst in thomhurst/TUnit#5376 * chore(deps): update dependency minimatch to v10 by @thomhurst in thomhurst/TUnit#5377 * chore(deps): update dependency picomatch to v4 by @thomhurst in thomhurst/TUnit#5382 * chore(deps): update dependency svgo to v4 by @thomhurst in thomhurst/TUnit#5383 * chore(deps): update dependency path-to-regexp to v1 [security] by @thomhurst in thomhurst/TUnit#5385 **Full Changelog**: thomhurst/TUnit@v1.25.0...v1.27.0 ## 1.25.0 <!-- Release notes generated using configuration in .github/release.yml at v1.25.0 --> ## What's Changed ### Other Changes * Fix missing `default` constraint on explicit interface implementations with unconstrained generics by @thomhurst in thomhurst/TUnit#5363 * feat(mocks): add ReturnsAsync typed factory overload with method parameters by @thomhurst in thomhurst/TUnit#5367 * Fix Arg.IsNull<T> and Arg.IsNotNull<T> to support nullable value types by @thomhurst in thomhurst/TUnit#5366 * refactor(mocks): use file-scoped types for generated implementation details by @thomhurst in thomhurst/TUnit#5369 * Compress HTML report JSON data and minify CSS by @thomhurst in thomhurst/TUnit#5368 ### Dependencies * chore(deps): update tunit to 1.24.31 by @thomhurst in thomhurst/TUnit#5356 * chore(deps): update dependency mockolate to 2.2.0 by @thomhurst in thomhurst/TUnit#5357 * chore(deps): update dependency polyfill to 9.24.1 by @thomhurst in thomhurst/TUnit#5365 * chore(deps): update dependency polyfill to 9.24.1 by @thomhurst in thomhurst/TUnit#5364 **Full Changelog**: thomhurst/TUnit@v1.24.31...v1.25.0 ## 1.24.31 <!-- Release notes generated using configuration in .github/release.yml at v1.24.31 --> ## What's Changed ### Other Changes * Fix Aspire 13.2.0+ timeout caused by ProjectRebuilderResource being awaited by @Copilot in thomhurst/TUnit#5335 * chore(deps): update dependency polyfill to 9.24.0 by @thomhurst in thomhurst/TUnit#5349 * Fix nullable IParsable type recognition in source generator and analyzer by @Copilot in thomhurst/TUnit#5354 * fix: resolve race condition in HookExecutionOrderTests by @thomhurst in thomhurst/TUnit#5355 * Fix MaxExternalSpansPerTest cap bypass when Activity.Parent chain is broken by @Copilot in thomhurst/TUnit#5352 ### Dependencies * chore(deps): update tunit to 1.24.18 by @thomhurst in thomhurst/TUnit#5340 * chore(deps): update dependency stackexchange.redis to 2.12.14 by @thomhurst in thomhurst/TUnit#5343 * chore(deps): update verify to 31.15.0 by @thomhurst in thomhurst/TUnit#5346 * chore(deps): update dependency polyfill to 9.24.0 by @thomhurst in thomhurst/TUnit#5348 **Full Changelog**: thomhurst/TUnit@v1.24.18...v1.24.31 ## 1.24.18 <!-- Release notes generated using configuration in .github/release.yml at v1.24.18 --> ## What's Changed ### Other Changes * feat(mocks): shorter, more readable generated mock type names by @thomhurst in thomhurst/TUnit#5334 * Fix DisposeAsync() ordering for nested property injection by @Copilot in thomhurst/TUnit#5337 ### Dependencies * chore(deps): update tunit to 1.24.13 by @thomhurst in thomhurst/TUnit#5331 **Full Changelog**: thomhurst/TUnit@v1.24.13...v1.24.18 ## 1.24.13 <!-- Release notes generated using configuration in .github/release.yml at v1.24.13 --> ## What's Changed ### Other Changes * perf(mocks): optimize MockEngine for lower allocation and faster verification by @thomhurst in thomhurst/TUnit#5319 * Remove defunct `UseTestingPlatformProtocol` reference for vscode by @erwinkramer in thomhurst/TUnit#5328 * perf(aspnetcore): prevent thread pool starvation during parallel WebApplicationTest server init by @thomhurst in thomhurst/TUnit#5329 * fix TUnit0073 for when type from from another assembly by @SimonCropp in thomhurst/TUnit#5322 * Fix implicit conversion operators bypassed in property injection casts by @Copilot in thomhurst/TUnit#5317 * fix(mocks): skip non-virtual 'new' methods when discovering mockable members by @thomhurst in thomhurst/TUnit#5330 * feat(mocks): IFoo.Mock() discovery with generic fallback and ORP resolution by @thomhurst in thomhurst/TUnit#5327 ### Dependencies * chore(deps): update tunit to 1.24.0 by @thomhurst in thomhurst/TUnit#5315 * chore(deps): update aspire to 13.2.1 by @thomhurst in thomhurst/TUnit#5323 * chore(deps): update verify to 31.14.0 by @thomhurst in thomhurst/TUnit#5325 ## New Contributors * @erwinkramer made their first contribution in thomhurst/TUnit#5328 **Full Changelog**: thomhurst/TUnit@v1.24.0...v1.24.13 ## 1.24.0 <!-- Release notes generated using configuration in .github/release.yml at v1.24.0 --> ## What's Changed ### Other Changes * perf: optimize TUnit.Mocks hot paths by @thomhurst in thomhurst/TUnit#5304 * fix: resolve System.Memory version conflict on .NET Framework (net462) by @thomhurst in thomhurst/TUnit#5303 * fix: resolve CS0460/CS0122/CS0115 when mocking concrete classes from external assemblies by @thomhurst in thomhurst/TUnit#5310 * feat(mocks): parameterless Returns() and ReturnsAsync() for async methods by @thomhurst in thomhurst/TUnit#5309 * Fix typo in NUnit manual migration guide by @aa-ko in thomhurst/TUnit#5312 * refactor(mocks): unify Mock.Of<T>() and Mock.OfPartial<T>() into single API by @thomhurst in thomhurst/TUnit#5311 * refactor(mocks): clean up Mock API surface by @thomhurst in thomhurst/TUnit#5314 * refactor(mocks): remove generic/untyped overloads from public API by @thomhurst in thomhurst/TUnit#5313 ### Dependencies * chore(deps): update tunit to 1.23.7 by @thomhurst in thomhurst/TUnit#5305 * chore(deps): update dependency mockolate to 2.1.1 by @thomhurst in thomhurst/TUnit#5307 ## New Contributors * @aa-ko made their first contribution in thomhurst/TUnit#5312 **Full Changelog**: thomhurst/TUnit@v1.23.7...v1.24.0 Commits viewable in [compare view](thomhurst/TUnit@v1.23.7...v1.28.7). </details> [](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
FindTestCaseAncestorwalksactivity.Parentin-memory to find the owning test case span. Libraries like Npgsql with async connection pooling break this chain —Activity.Parentis null by the timeOnActivityStoppedfires, so spans fall into the uncapped infrastructure bucket. This grew HTML reports to 121+ MB, crashing withSystem.OverflowExceptioninConsole.WriteLine.Three-layer defense in
ActivityCollector:OnActivityStartedcallback — pre-registers test case span IDs in_testCaseSpanIdsso they're available before child spans stop (children stop before parents)FindTestCaseAncestorParentSpanId fallback — whenActivity.Parentwalk fails, checksactivity.ParentSpanIdagainst pre-registered test case span IDs (W3C trace context is intact even when the object reference is broken)traceIdvia_externalSpanCountsByTraceinstead of leaving them uncapped