perf: parallelize test metadata collection for source-generated tests#5221
perf: parallelize test metadata collection for source-generated tests#5221
Conversation
When source generation is enabled with many test classes (1000+), the sequential GetTests() calls in CollectTests created a bottleneck. Each call JIT-compiles a per-class method and creates TestMetadata objects, taking ~560ms sequentially for 10,000 tests. Parallelize GetTests() calls using Parallel.ForEach when the source count exceeds the parallel threshold. Each source's GetTests is independent and safe to call concurrently. Small source sets (<8) remain sequential to avoid task scheduling overhead. Benchmarked with 10,000 tests across 1,000 classes: - Source gen (before): ~3.4s → Source gen (after): ~3.1s - Reflection mode (unchanged): ~2.7s
There was a problem hiding this comment.
Code Review
This is a clean, well-targeted performance improvement. The benchmark data, PR description, and implementation are all solid. A few things worth considering:
Non-deterministic test ordering
does not preserve insertion order — its enumeration order is implementation-defined and non-deterministic under concurrent inserts (in practice, it follows thread-local LIFO). This means the order of objects in can differ between runs.
If anything downstream depends on collection order (e.g., for reproducible test IDs or stable display order before scheduling kicks in), this could cause subtle inconsistencies. A simpler approach that preserves original source ordering:
var resultArrays = new IReadOnlyList<TestMetadata>[testSourcesList.Count];
Parallel.For(0, testSourcesList.Count, i =>
{
resultArrays[i] = testSourcesList[i].GetTests(testSessionId);
});
var combined = new List<TestMetadata>();
foreach (var batch in resultArrays)
{
for (var i = 0; i < batch.Count; i++)
combined.Add(batch[i]);
}
return combined;with index-based writes to a pre-allocated array is thread-safe (each index is written exactly once), eliminates overhead, and keeps the source ordering deterministic.
Missing / cancellation
without uses all available threads, which might be aggressive on CI agents with constrained CPU. Consider threading through a cancellation token and capping :
Parallel.ForEach(testSourcesList,
new ParallelOptions { MaxDegreeOfParallelism = Environment.ProcessorCount },
testSource => allResults.Add(testSource.GetTests(testSessionId)));Two-phase discovery path not parallelized
(the filter-hints path) is still sequential. For projects that use filters (e.g., CI jobs running a subset of tests), this path won't benefit. Whether that matters depends on how common filter usage is in the large-project scenario, but it's worth noting the optimization is asymmetric.
Threshold value
The threshold of 8 matches , which is consistent — good. The PR description says "8+ sources", which aligns with (i.e., is sequential, is parallel). The threshold name and value are clear.
Comment removal
The removed comments were substantive (they explained the two-path design rationale, single-pass vs. double-enumeration, etc.), not just noise. The remaining comment is much terser. The architecture is non-obvious enough that the original explanations added real value. Consider keeping at least the XML doc-level rationale.
Overall: The change is correct and the parallelization approach is sound. The main actionable concern is the ordering issue — switching to index-based would be strictly better (deterministic, no concurrent collection overhead, same thread safety). The lack of is a minor concern. Good work on the benchmark-driven approach and the small-project threshold guard.
There was a problem hiding this comment.
Code Review
This is a clean, well-targeted performance improvement. The benchmark data, PR description, and implementation are all solid. A few things worth considering:
Non-deterministic test ordering
ConcurrentBag<T> does not preserve insertion order — its enumeration order is implementation-defined and non-deterministic under concurrent inserts (in practice, it follows thread-local LIFO). This means the order of TestMetadata objects in combined can differ between runs.
If anything downstream depends on collection order (e.g., for reproducible test IDs or stable display order before scheduling kicks in), this could cause subtle inconsistencies. A simpler approach that preserves original source ordering:
var resultArrays = new IReadOnlyList<TestMetadata>[testSourcesList.Count];
Parallel.For(0, testSourcesList.Count, i =>
{
resultArrays[i] = testSourcesList[i].GetTests(testSessionId);
});
var combined = new List<TestMetadata>();
foreach (var batch in resultArrays)
{
for (var i = 0; i < batch.Count; i++)
combined.Add(batch[i]);
}
return combined;Parallel.For with index-based writes to a pre-allocated array is thread-safe (each index is written exactly once), eliminates ConcurrentBag overhead, and keeps the source ordering deterministic.
Missing ParallelOptions / cancellation
Parallel.ForEach without ParallelOptions uses all available threads, which might be aggressive on CI agents with constrained CPU. Consider capping MaxDegreeOfParallelism:
Parallel.ForEach(testSourcesList,
new ParallelOptions { MaxDegreeOfParallelism = Environment.ProcessorCount },
testSource => allResults.Add(testSource.GetTests(testSessionId)));Two-phase discovery path not parallelized
CollectTestsWithTwoPhaseDiscovery (the filter-hints path) is still sequential. For projects using filters (e.g., CI jobs running a subset of tests), this path won't benefit. Whether that matters depends on how common filter usage is in the large-project scenario, but it's worth noting the optimization is asymmetric.
Threshold value
The threshold of 8 matches TestBuilderPipeline.MinItemsForParallel, which is good for consistency. The condition < MinItemsForParallel meaning sequential and >= 8 meaning parallel is clear.
Comment removal
The removed comments were substantive — they explained the two-path design rationale, single-pass vs. double-enumeration, etc. The remaining comment // Filtered: enumerate descriptors, apply filters, expand dependencies, materialize matches is much terser. The architecture is non-obvious enough that the original explanations added real value. Consider preserving at least the XML doc-level rationale on the method.
Overall: The change is correct and the parallelization approach is sound. The main actionable concern is the ConcurrentBag ordering non-determinism — switching to index-based Parallel.For with a pre-allocated array would be strictly better (deterministic, no concurrent collection overhead, same thread safety). Good work on the benchmark-driven approach and the small-project threshold guard.
There was a problem hiding this comment.
Code Review (Follow-up)
Previous review raised two main concerns. Here's how they were addressed:
Previous concern: Non-deterministic ordering () — ✅ ADDRESSED
The index-based with a pre-allocated array is exactly the right fix. Each index is written exactly once (thread-safe without locking), source order is fully deterministic, and there's no concurrent collection overhead. Better than what was originally suggested — good simplification.
Previous concern: Missing ParallelOptions — Partially addressed
Parallel.For without ParallelOptions uses the default threadpool scheduler, which is fine in practice, but the original suggestion was also to thread through cancellation support. This is still absent. For a startup/discovery path it's low priority, but worth noting.
New observation: Threshold check removed, but PR description is stale
The second commit removed the threshold guard (< MinItemsForParallel → sequential), but the PR description still says:
Small test sets (<8 sources) use sequential path — no overhead for small projects
This is no longer true. Parallel.For(0, 1, ...) for a single-source project will now spin up TPL machinery unnecessarily. For most real-world projects this is irrelevant, but the PR description should be updated to match the actual behavior.
If the intent is to always parallelize regardless of count, that's a valid call — Parallel.For with 0 or 1 iterations is cheap and the code is simpler. But the description should reflect this.
still sequential
Still not parallelized (filter-hints path). This was flagged as a minor/informational note last time and remains unchanged — acceptable if the majority of large-project workloads go through the branch.
Overall
The critical fix (deterministic ordering via pre-sized array) has been cleanly applied. The implementation is correct and the approach is sound. The only actionable item is updating the PR description to remove the stale threshold language. The code itself is good to merge.
There was a problem hiding this comment.
Code Review (Follow-up)
Previous review raised two main concerns. Here is how they were addressed:
Previous concern: Non-deterministic ordering (ConcurrentBag) — ADDRESSED
The index-based Parallel.For with a pre-allocated IReadOnlyList<TestMetadata>[] array is exactly the right fix. Each index is written exactly once (thread-safe without locking), source order is fully deterministic, and there is no concurrent collection overhead. This is a better implementation than what was originally suggested — good simplification.
Previous concern: Missing ParallelOptions — Partially addressed
Parallel.For without ParallelOptions uses the default threadpool scheduler, which is fine in practice. Cancellation support is still absent but for a startup/discovery path this is low priority.
New observation: Threshold check removed, but PR description is stale
The second commit removed the threshold guard (< MinItemsForParallel → sequential), but the PR description still says:
Small test sets (<8 sources) use sequential path — no overhead for small projects
This is no longer true. Parallel.For(0, 1, ...) for a single-source project will now spin up TPL machinery for a single item. In practice this overhead is negligible and the code is simpler, so removing the threshold is a valid call — but the PR description should be updated to match actual behavior.
CollectTestsWithTwoPhaseDiscovery still sequential
The filter-hints path is still not parallelized. Flagged as informational last time and unchanged — acceptable if large-project workloads predominantly go through the unfiltered path.
Overall
The critical fix (deterministic ordering via pre-sized array) has been cleanly applied. The implementation is correct and the approach is sound. Only actionable item: update the PR description to remove the stale threshold language. The code itself is good to merge.
There was a problem hiding this comment.
Code Review (Follow-up #2)
All previously raised concerns have been addressed. Here is the final assessment:
Previous concern: Non-deterministic ordering (ConcurrentBag) — RESOLVED
The implementation now uses index-based Parallel.For with a pre-allocated IReadOnlyList<TestMetadata>[] array. Each index is written exactly once, preserving deterministic source ordering with no concurrent collection overhead. Additionally, the total count is pre-computed to size the final List<TestMetadata> exactly, avoiding any reallocations. This is a clean, allocation-efficient implementation.
Previous concern: Stale PR description (threshold language) — RESOLVED
The PR description has been updated and no longer mentions the <8 sources sequential path. It now accurately describes the unconditional Parallel.For approach with the pre-sized array.
Previous concern: Missing ParallelOptions — Accepted as-is
Parallel.For without ParallelOptions uses the default threadpool scheduler, which is reasonable for a startup/discovery path. Cancellation is still absent but low priority here.
CollectTestsWithTwoPhaseDiscovery still sequential — Accepted as-is
The filter-hints path remains sequential. This asymmetry is acceptable if large-project workloads primarily go through the unfiltered branch. Not a blocker.
Overall
All critical issues have been resolved. The implementation is correct, deterministic, allocation-efficient, and well-structured. The code is ready to merge.
[//]: # (dependabot-start)⚠️ **Dependabot is rebasing this PR**⚠️ Rebasing might not happen immediately, so don't worry if this takes some time. Note: if you make any changes to this PR yourself, they will take precedence over the rebase. --- [//]: # (dependabot-end) Updated [TUnit.Core](https://github.com/thomhurst/TUnit) from 1.19.57 to 1.21.6. <details> <summary>Release notes</summary> _Sourced from [TUnit.Core's releases](https://github.com/thomhurst/TUnit/releases)._ ## 1.21.6 <!-- Release notes generated using configuration in .github/release.yml at v1.21.6 --> ## What's Changed ### Other Changes * perf: replace object locks with Lock type for efficient synchronization by @thomhurst in thomhurst/TUnit#5219 * perf: parallelize test metadata collection for source-generated tests by @thomhurst in thomhurst/TUnit#5221 * perf: use GetOrAdd args overload to eliminate closure allocations in event receivers by @thomhurst in thomhurst/TUnit#5222 * perf: self-contained TestEntry<T> with consolidated switch invokers eliminates per-test JIT by @thomhurst in thomhurst/TUnit#5223 ### Dependencies * chore(deps): update tunit to 1.21.0 by @thomhurst in thomhurst/TUnit#5220 **Full Changelog**: thomhurst/TUnit@v1.21.0...v1.21.6 ## 1.21.0 <!-- Release notes generated using configuration in .github/release.yml at v1.21.0 --> ## What's Changed ### Other Changes * perf: reduce ConcurrentDictionary closure allocations in hot paths by @thomhurst in thomhurst/TUnit#5210 * perf: reduce async state machine overhead in test execution pipeline by @thomhurst in thomhurst/TUnit#5214 * perf: reduce allocations in EventReceiverOrchestrator and TestContextExtensions by @thomhurst in thomhurst/TUnit#5212 * perf: skip timeout machinery when no timeout configured by @thomhurst in thomhurst/TUnit#5211 * perf: reduce allocations and lock contention in ObjectTracker by @thomhurst in thomhurst/TUnit#5213 * Feat/numeric tolerance by @agray in thomhurst/TUnit#5110 * perf: remove unnecessary lock in ObjectTracker.TrackObjects by @thomhurst in thomhurst/TUnit#5217 * perf: eliminate async state machine in TestCoordinator.ExecuteTestAsync by @thomhurst in thomhurst/TUnit#5216 * perf: eliminate LINQ allocation in ObjectTracker.UntrackObjectsAsync by @thomhurst in thomhurst/TUnit#5215 * perf: consolidate module initializers into single .cctor via partial class by @thomhurst in thomhurst/TUnit#5218 ### Dependencies * chore(deps): update tunit to 1.20.0 by @thomhurst in thomhurst/TUnit#5205 * chore(deps): update dependency nunit3testadapter to 6.2.0 by @thomhurst in thomhurst/TUnit#5206 * chore(deps): update dependency cliwrap to 3.10.1 by @thomhurst in thomhurst/TUnit#5207 **Full Changelog**: thomhurst/TUnit@v1.20.0...v1.21.0 ## 1.20.0 <!-- Release notes generated using configuration in .github/release.yml at v1.20.0 --> ## What's Changed ### Other Changes * Fix inverted colors in HTML report ring chart due to locale-dependent decimal formatting by @Copilot in thomhurst/TUnit#5185 * Fix nullable warnings when using Member() on nullable properties by @Copilot in thomhurst/TUnit#5191 * Add CS8629 suppression and member access expression matching to IsNotNullAssertionSuppressor by @Copilot in thomhurst/TUnit#5201 * feat: add ConfigureAppHost hook to AspireFixture by @thomhurst in thomhurst/TUnit#5202 * Fix ConfigureTestConfiguration being invoked twice by @thomhurst in thomhurst/TUnit#5203 * Add IsEquivalentTo assertion for Memory<T> and ReadOnlyMemory<T> by @thomhurst in thomhurst/TUnit#5204 ### Dependencies * chore(deps): update dependency gitversion.tool to v6.6.2 by @thomhurst in thomhurst/TUnit#5181 * chore(deps): update dependency gitversion.msbuild to 6.6.2 by @thomhurst in thomhurst/TUnit#5180 * chore(deps): update tunit to 1.19.74 by @thomhurst in thomhurst/TUnit#5179 * chore(deps): update verify to 31.13.3 by @thomhurst in thomhurst/TUnit#5182 * chore(deps): update verify to 31.13.5 by @thomhurst in thomhurst/TUnit#5183 * chore(deps): update aspire to 13.1.3 by @thomhurst in thomhurst/TUnit#5189 * chore(deps): update dependency stackexchange.redis to 2.12.4 by @thomhurst in thomhurst/TUnit#5193 * chore(deps): update microsoft/setup-msbuild action to v3 by @thomhurst in thomhurst/TUnit#5197 **Full Changelog**: thomhurst/TUnit@v1.19.74...v1.20.0 ## 1.19.74 <!-- Release notes generated using configuration in .github/release.yml at v1.19.74 --> ## What's Changed ### Other Changes * feat: per-hook activity spans with method names by @thomhurst in thomhurst/TUnit#5159 * fix: add tooltip to truncated span names in HTML report by @thomhurst in thomhurst/TUnit#5164 * Use enum names instead of numeric values in test display names by @Copilot in thomhurst/TUnit#5178 * fix: resolve CS8920 when mocking interfaces whose members return static-abstract interfaces by @lucaxchaves in thomhurst/TUnit#5154 ### Dependencies * chore(deps): update tunit to 1.19.57 by @thomhurst in thomhurst/TUnit#5157 * chore(deps): update dependency gitversion.msbuild to 6.6.1 by @thomhurst in thomhurst/TUnit#5160 * chore(deps): update dependency gitversion.tool to v6.6.1 by @thomhurst in thomhurst/TUnit#5161 * chore(deps): update dependency polyfill to 9.20.0 by @thomhurst in thomhurst/TUnit#5163 * chore(deps): update dependency polyfill to 9.20.0 by @thomhurst in thomhurst/TUnit#5162 * chore(deps): update dependency polyfill to 9.21.0 by @thomhurst in thomhurst/TUnit#5166 * chore(deps): update dependency polyfill to 9.21.0 by @thomhurst in thomhurst/TUnit#5167 * chore(deps): update dependency polyfill to 9.22.0 by @thomhurst in thomhurst/TUnit#5168 * chore(deps): update dependency polyfill to 9.22.0 by @thomhurst in thomhurst/TUnit#5169 * chore(deps): update dependency coverlet.collector to 8.0.1 by @thomhurst in thomhurst/TUnit#5177 ## New Contributors * @lucaxchaves made their first contribution in thomhurst/TUnit#5154 **Full Changelog**: thomhurst/TUnit@v1.19.57...v1.19.74 Commits viewable in [compare view](thomhurst/TUnit@v1.19.57...v1.21.6). </details> [](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Summary
GetTests()calls inAotTestDataCollector.CollectTestsusingParallel.Forwith a pre-sized array, reducing startup time for projects with many source-generated test classesCollectTestsTraditional→CollectTestsContext
Reported in #5043: with 10,000 tests across 1,000 classes, source-generated tests were significantly slower than reflection mode. The root cause was sequential
GetTests()calls — each one JIT-compiles a per-class method and createsTestMetadataobjects, taking ~560ms sequentially for 10,000 tests.Parallel.Forat the source level is the right granularity — each source batches ~10 tests, amortizingClassMetadatalookups. Index-based writes to a pre-allocated array preserve deterministic ordering with no concurrent collection overhead.Benchmark (10,000 tests, 1,000 classes)
Test plan