perf: reduce async state machine overhead in test execution pipeline#5214
perf: reduce async state machine overhead in test execution pipeline#5214
Conversation
- Elide async/await in forwarding methods (TestExecutor discovery hooks, TestMethodInvoker, RetryHelper.ShouldRetry/ApplyBackoffDelay) to avoid unnecessary state machine allocations - Replace async lambda wrappers with direct ValueTask construction in TestExecutor.ExecuteTestAsync and TestCoordinator retry path - Change RetryHelper.ExecuteWithRetry to accept Func<ValueTask> instead of Func<Task> to avoid ValueTask-to-Task conversion on the retry path - Cache DateTimeOffset.UtcNow in TestStateManager.MarkFailed, TestBuilder, and TestBuilderPipeline to avoid redundant clock reads - Replace Stopwatch instance with Stopwatch.GetTimestamp() / Stopwatch.GetElapsedTime() in DiscoveryCircuitBreaker on .NET 8+ (falls back to Stopwatch on netstandard2.0)
There was a problem hiding this comment.
Code Review
Well-structured performance PR. The changes are correct and the overall direction is good — reducing state machine allocations in hot paths is the right focus. A few observations worth discussing.
Solid wins
-
****: Cleanest change in the PR. Replacing with is a genuine allocation improvement (eliminates state machine + closure), and it works perfectly now that
ExecuteWithRetryacceptsFunc<ValueTask>. -
** / **: Caching
DateTimeOffset.UtcNowwhere two adjacent reads occurred is correct — it ensures consistentStart/Endtimestamps for failed tests and eliminates a syscall. -
DiscoveryCircuitBreaker.cs: TheStopwatch.GetTimestamp()/Stopwatch.GetElapsedTime()approach on#if NETavoids theStopwatchobject allocation. The#if NETsymbol is the right choice here (covers net8.0+, not netstandard2.0).
Issues / Questions
1. — no-op caching
The PR description says "where the same timestamp was read twice in adjacent lines", but there's only one call in (the line). The cached variable is only used once, so this change has no effect on performance. The in the initializer reads , not . This could be removed to keep the diff clean, or the PR description should be corrected.
2. discovery hooks — allocation concern
is and iterates hooks — it's virtually never synchronously complete. Calling on an incomplete allocates a object internally. The previous wrapper allocated a state machine instead. These are roughly allocation-equivalent for the common (non-trivially-complete) case, so this is more of a style change than a perf win for discovery hooks. If these callers were themselves, the step could be avoided entirely. Not a blocker, but worth being clear about what's actually saved here.
3. — Task→ValueTask→Task round-trip
returns . Wrapping it in is correct (no copy, just a wrapper). Then on an incomplete allocates again. The original wrapper path also allocated a state machine for this branch. So this is allocation-neutral in the common case — not a regression, but the PR description's claim of "avoiding a state machine + closure allocation" applies more to than here.
The non-executor path () is a genuine win — no allocation at all.
4. — allocates on every retry
allocates a new each call. In the retry path this is invoked on each exception. A cached alternative would avoid this:
This is a micro-optimization, but since the original was (which also allocated a state machine), neither approach is particularly allocation-heavy. Low priority.
Minor
- and both contain identical logic for the branch. This pre-existing duplication is unrelated to this PR but could be a future consolidation target.
- The null-check removal ( → guarded by ) is correct since is always initialized in that branch.
Overall this is a clean, well-motivated PR. The and caching changes are the clearest wins. The async elision changes are correct but the allocation benefit is more nuanced than described for the discovery hook and paths.
There was a problem hiding this comment.
Code Review
Well-structured performance PR. The changes are correct and the overall direction is good — reducing state machine allocations in hot paths is the right focus. A few observations worth discussing.
Solid wins
-
TestCoordinator.cs: Cleanest change in the PR. Replacing the async lambda wrapper with a direct method group reference is a genuine allocation improvement (eliminates state machine + closure), and works perfectly now that
ExecuteWithRetryacceptsFunc<ValueTask>. -
TestBuilder.cs / TestBuilderPipeline.cs: Caching
DateTimeOffset.UtcNowwhere two adjacent reads occurred is correct — ensures consistent Start/End timestamps for failed tests and eliminates a redundant syscall. -
DiscoveryCircuitBreaker.cs: The
Stopwatch.GetTimestamp()/Stopwatch.GetElapsedTime()approach on#if NETavoids theStopwatchobject allocation. The#if NETsymbol is the right choice (covers net8.0+, not netstandard2.0).
Issues / Questions
1. TestStateManager.MarkFailed — no-op caching
The PR description says "where the same timestamp was read twice in adjacent lines", but there is only one DateTimeOffset.UtcNow call in MarkFailed (the ??= line). The cached now variable is only used once, and test.Result uses test.EndTime (not another UtcNow call). This change has zero performance impact and could be removed to keep the diff clean, or the PR description should be corrected.
2. TestExecutor.cs discovery hooks — .AsTask() allocation concern
HookExecutor.ExecuteBeforeTestDiscoveryHooksAsync is async ValueTask and iterates hooks — it is virtually never synchronously complete. Calling .AsTask() on an incomplete ValueTask allocates a Task object internally. The previous async Task wrapper allocated a state machine instead. These are roughly allocation-equivalent in the common case, so this is more of a style change than a clear perf win for discovery hooks. If the callers were changed to async ValueTask, the .AsTask() step could be eliminated entirely. Not a blocker, but worth being clear about what is actually saved here.
3. TestMethodInvoker.cs — Task→ValueTask→Task round-trip
test.InvokeTestAsync(...) returns Task. Wrapping it in new ValueTask(task) is correct (just a struct wrapper, no copy). Then .AsTask() on an incomplete ValueTask allocates again. The original async wrapper path also allocated a state machine for this branch. This is allocation-neutral in the common case — the PR description's claim of "avoiding a state machine + closure allocation" applies more to TestCoordinator.cs than here.
The non-executor fast path (return test.InvokeTestAsync(...) directly) is a genuine win — zero overhead.
4. ShouldRetry — Task.FromResult allocates on every call
Task.FromResult<T> allocates a new Task<bool> on each invocation. Since the original async Task<bool> also allocated a state machine, neither is zero-cost. A simple improvement:
private static readonly Task<bool> s_shouldRetryTrue = Task.FromResult(true);
private static readonly Task<bool> s_shouldRetryFalse = Task.FromResult(false);Low priority micro-optimization, but it is cheap to add and eliminates the allocation on the retry hot path.
Minor
TestMethodInvokerandTestExecutor.ExecuteTestAsynccontain identical logic for thetestExecutorbranch. Pre-existing duplication, not introduced here, but a future consolidation target.- The
DiscoveryCircuitBreaker.Dispose()null-check removal (_stopwatch?.Stop()→_stopwatch.Stop()guarded by#if !NET) is correct since_stopwatchis always initialized in that branch.
Overall a clean, well-motivated PR. The TestCoordinator.cs simplification and DateTimeOffset caching changes in TestBuilder/TestBuilderPipeline are the clearest wins. The async-elision changes are correct but the allocation benefit is more nuanced than described for the discovery hook and TestMethodInvoker paths. Looks good to merge.
There was a problem hiding this comment.
Code Review (Follow-up)
This review compares against the previous Claude review to check which concerns have been addressed.
What's been addressed ✅
Issue #4 — ShouldRetry caching: The suggestion to cache Task.FromResult(true/false) as static readonly fields has been implemented:
private static readonly Task<bool> s_shouldRetryTrue = Task.FromResult(true);
private static readonly Task<bool> s_shouldRetryFalse = Task.FromResult(false);This is a real improvement — eliminates one Task<bool> allocation per retry on both early-exit paths. Good addition.
Still open from previous review
Issue #1 — PR description inaccuracy: The description still says "Cache DateTimeOffset.UtcNow in TestStateManager.MarkFailed", but TestStateManager.cs is not in the diff. Worth correcting the description to avoid confusing reviewers.
Issue #2 — .AsTask() allocation for discovery hooks: ExecuteBeforeTestDiscoveryHooksAsync / ExecuteAfterTestDiscoveryHooksAsync in TestExecutor.cs convert a (nearly always incomplete) ValueTask to Task via .AsTask(), which allocates a Task object. The original async Task wrappers also allocated a state machine. These are roughly allocation-equivalent — this is a style change, not a perf win. Not a blocker, but the PR description shouldn't count this as a reduction in state machine overhead.
Issue #3 — TestMethodInvoker Task→ValueTask→Task round-trip: The new ValueTask(task).AsTask() path is still present. Same concern as above — the fast path (no executor) is a real win; the executor path is allocation-neutral, not an improvement.
New observations
DiscoveryCircuitBreaker.Dispose() — null-check removal correctness: The change from _stopwatch?.Stop() to _stopwatch.Stop() (under #if !NET) is safe because _stopwatch is assigned in the constructor on that branch. However, DiscoveryCircuitBreaker is sealed, so there's no risk of a derived class calling Dispose() before the constructor runs. Correct as written.
GetResourceUsage — minor improvement: Caching elapsed to avoid calling GetElapsed() twice in the same method is a small but clean win, independent of the #if NET optimization.
Summary
The s_shouldRetryTrue/False caching addition is a concrete improvement over the previous version. The TestCoordinator.cs lambda-to-method-group change and TestBuilder/TestBuilderPipeline timestamp caching remain the clearest wins in the PR. The remaining open items (#1, #2, #3) from the prior review are non-blocking. This looks good to merge as-is, with a minor suggestion to correct the PR description.
There was a problem hiding this comment.
Code Review (Follow-up on latest commit)
This reviews the latest commit 74e63ecc7b33 perf: eliminate unnecessary Task/ValueTask round-trips against the previous Claude review.
All prior issues addressed ✅
Issue #2 — .AsTask() allocation for discovery hooks: Now fixed. ExecuteBeforeTestDiscoveryHooksAsync and ExecuteAfterTestDiscoveryHooksAsync in TestExecutor.cs now return ValueTask directly (pass-through from _hookExecutor), eliminating the .AsTask() heap allocation. This is a genuine perf improvement over the previous version.
Issue #3 — TestMethodInvoker Task→ValueTask→Task round-trip: Now fixed. The return type is changed to ValueTask and the .AsTask() call in the executor branch is gone. The fast path wraps Task in new ValueTask(task) (struct, no heap allocation). Clean.
TestExecutor.cs — new ValueTask<List<Exception>>(task.AsTask()) wrappers: All four occurrences are gone. _hookExecutor.ExecuteAfterTestSessionHooksAsync and ExecuteAfterAssemblyHooksAsync now return ValueTask<List<Exception>> directly to their callers, skipping the unnecessary Task conversion. This is the most impactful change in the latest commit.
Remaining minor items (non-blocking)
- Missing newline at EOF in
TestMethodInvoker.cs: TheNo newline at end of filemarker is still present. Minor, but worth a one-character fix. - Pre-existing duplication:
TestMethodInvoker.InvokeTestAsyncandTestExecutor.ExecuteTestAsyncstill both contain thetestExecutor.ExecuteTest(...)pattern independently. Not introduced by this PR — a future consolidation opportunity.
Summary
The three-commit arc of this PR is well-structured: initial optimization → fix no-op + cache Task.FromResult → eliminate .AsTask() round-trips. Each commit responds directly to review feedback. The final state is meaningfully better than the original in all targeted paths. Ready to merge.
[//]: # (dependabot-start)⚠️ **Dependabot is rebasing this PR**⚠️ Rebasing might not happen immediately, so don't worry if this takes some time. Note: if you make any changes to this PR yourself, they will take precedence over the rebase. --- [//]: # (dependabot-end) Updated [TUnit.Core](https://github.com/thomhurst/TUnit) from 1.19.57 to 1.21.6. <details> <summary>Release notes</summary> _Sourced from [TUnit.Core's releases](https://github.com/thomhurst/TUnit/releases)._ ## 1.21.6 <!-- Release notes generated using configuration in .github/release.yml at v1.21.6 --> ## What's Changed ### Other Changes * perf: replace object locks with Lock type for efficient synchronization by @thomhurst in thomhurst/TUnit#5219 * perf: parallelize test metadata collection for source-generated tests by @thomhurst in thomhurst/TUnit#5221 * perf: use GetOrAdd args overload to eliminate closure allocations in event receivers by @thomhurst in thomhurst/TUnit#5222 * perf: self-contained TestEntry<T> with consolidated switch invokers eliminates per-test JIT by @thomhurst in thomhurst/TUnit#5223 ### Dependencies * chore(deps): update tunit to 1.21.0 by @thomhurst in thomhurst/TUnit#5220 **Full Changelog**: thomhurst/TUnit@v1.21.0...v1.21.6 ## 1.21.0 <!-- Release notes generated using configuration in .github/release.yml at v1.21.0 --> ## What's Changed ### Other Changes * perf: reduce ConcurrentDictionary closure allocations in hot paths by @thomhurst in thomhurst/TUnit#5210 * perf: reduce async state machine overhead in test execution pipeline by @thomhurst in thomhurst/TUnit#5214 * perf: reduce allocations in EventReceiverOrchestrator and TestContextExtensions by @thomhurst in thomhurst/TUnit#5212 * perf: skip timeout machinery when no timeout configured by @thomhurst in thomhurst/TUnit#5211 * perf: reduce allocations and lock contention in ObjectTracker by @thomhurst in thomhurst/TUnit#5213 * Feat/numeric tolerance by @agray in thomhurst/TUnit#5110 * perf: remove unnecessary lock in ObjectTracker.TrackObjects by @thomhurst in thomhurst/TUnit#5217 * perf: eliminate async state machine in TestCoordinator.ExecuteTestAsync by @thomhurst in thomhurst/TUnit#5216 * perf: eliminate LINQ allocation in ObjectTracker.UntrackObjectsAsync by @thomhurst in thomhurst/TUnit#5215 * perf: consolidate module initializers into single .cctor via partial class by @thomhurst in thomhurst/TUnit#5218 ### Dependencies * chore(deps): update tunit to 1.20.0 by @thomhurst in thomhurst/TUnit#5205 * chore(deps): update dependency nunit3testadapter to 6.2.0 by @thomhurst in thomhurst/TUnit#5206 * chore(deps): update dependency cliwrap to 3.10.1 by @thomhurst in thomhurst/TUnit#5207 **Full Changelog**: thomhurst/TUnit@v1.20.0...v1.21.0 ## 1.20.0 <!-- Release notes generated using configuration in .github/release.yml at v1.20.0 --> ## What's Changed ### Other Changes * Fix inverted colors in HTML report ring chart due to locale-dependent decimal formatting by @Copilot in thomhurst/TUnit#5185 * Fix nullable warnings when using Member() on nullable properties by @Copilot in thomhurst/TUnit#5191 * Add CS8629 suppression and member access expression matching to IsNotNullAssertionSuppressor by @Copilot in thomhurst/TUnit#5201 * feat: add ConfigureAppHost hook to AspireFixture by @thomhurst in thomhurst/TUnit#5202 * Fix ConfigureTestConfiguration being invoked twice by @thomhurst in thomhurst/TUnit#5203 * Add IsEquivalentTo assertion for Memory<T> and ReadOnlyMemory<T> by @thomhurst in thomhurst/TUnit#5204 ### Dependencies * chore(deps): update dependency gitversion.tool to v6.6.2 by @thomhurst in thomhurst/TUnit#5181 * chore(deps): update dependency gitversion.msbuild to 6.6.2 by @thomhurst in thomhurst/TUnit#5180 * chore(deps): update tunit to 1.19.74 by @thomhurst in thomhurst/TUnit#5179 * chore(deps): update verify to 31.13.3 by @thomhurst in thomhurst/TUnit#5182 * chore(deps): update verify to 31.13.5 by @thomhurst in thomhurst/TUnit#5183 * chore(deps): update aspire to 13.1.3 by @thomhurst in thomhurst/TUnit#5189 * chore(deps): update dependency stackexchange.redis to 2.12.4 by @thomhurst in thomhurst/TUnit#5193 * chore(deps): update microsoft/setup-msbuild action to v3 by @thomhurst in thomhurst/TUnit#5197 **Full Changelog**: thomhurst/TUnit@v1.19.74...v1.20.0 ## 1.19.74 <!-- Release notes generated using configuration in .github/release.yml at v1.19.74 --> ## What's Changed ### Other Changes * feat: per-hook activity spans with method names by @thomhurst in thomhurst/TUnit#5159 * fix: add tooltip to truncated span names in HTML report by @thomhurst in thomhurst/TUnit#5164 * Use enum names instead of numeric values in test display names by @Copilot in thomhurst/TUnit#5178 * fix: resolve CS8920 when mocking interfaces whose members return static-abstract interfaces by @lucaxchaves in thomhurst/TUnit#5154 ### Dependencies * chore(deps): update tunit to 1.19.57 by @thomhurst in thomhurst/TUnit#5157 * chore(deps): update dependency gitversion.msbuild to 6.6.1 by @thomhurst in thomhurst/TUnit#5160 * chore(deps): update dependency gitversion.tool to v6.6.1 by @thomhurst in thomhurst/TUnit#5161 * chore(deps): update dependency polyfill to 9.20.0 by @thomhurst in thomhurst/TUnit#5163 * chore(deps): update dependency polyfill to 9.20.0 by @thomhurst in thomhurst/TUnit#5162 * chore(deps): update dependency polyfill to 9.21.0 by @thomhurst in thomhurst/TUnit#5166 * chore(deps): update dependency polyfill to 9.21.0 by @thomhurst in thomhurst/TUnit#5167 * chore(deps): update dependency polyfill to 9.22.0 by @thomhurst in thomhurst/TUnit#5168 * chore(deps): update dependency polyfill to 9.22.0 by @thomhurst in thomhurst/TUnit#5169 * chore(deps): update dependency coverlet.collector to 8.0.1 by @thomhurst in thomhurst/TUnit#5177 ## New Contributors * @lucaxchaves made their first contribution in thomhurst/TUnit#5154 **Full Changelog**: thomhurst/TUnit@v1.19.57...v1.19.74 Commits viewable in [compare view](thomhurst/TUnit@v1.19.57...v1.21.6). </details> [](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Summary
ExecuteWithRetry, eliminating state machine + closure allocationExecuteWithRetryto acceptFunc<ValueTask>instead ofFunc<Task>to avoid adapter allocation; cacheTask.FromResultresults as static readonly fields inShouldRetryDateTimeOffset.UtcNowwhere two adjacent reads occurred to ensure consistent Start/End timestamps and eliminate redundant syscallsStopwatchinstance withStopwatch.GetTimestamp()/Stopwatch.GetElapsedTime()on .NET 8+ to avoid object allocationRationale
Profiling shows ~3.8% exclusive CPU in
AsyncMethodBuilderCore.Start(2.59%) andAsyncLocalValueMap.Set(1.21%) from deep async call chains, plus 1.16% inDateTime.get_UtcNow. These changes reduce async state machine overhead and unnecessary clock reads.Test plan