🤖 Update Benchmark Results by thomhurst · Pull Request #6009 · thomhurst/TUnit

thomhurst · 2026-05-25T00:58:59Z

Automated Benchmark Update

This PR updates the benchmark documentation with the latest results from the Speed Comparison workflow.

Benchmarks Produced

Individual benchmark artifacts are available for download:

benchmark-DataDrivenTests
benchmark-AsyncTests
benchmark-ScaleTests
benchmark-MatrixTests
benchmark-MassiveParallelTests
benchmark-SetupTeardownTests
benchmark-BuildTime
benchmark-summary (aggregated overview)

Runtime Benchmarks

AsyncTests
DataDrivenTests
MassiveParallelTests
MatrixTests
ScaleTests
SetupTeardownTests

Build Benchmarks

BuildTime
2026-05-25
Ubuntu Latest • .NET SDK 10.0.300

Changes

Updated benchmark data in docs/static/benchmarks/latest.json
Updated historical trends in docs/static/benchmarks/historical.json
Regenerated benchmark documentation in docs/docs/benchmarks/index.md
Updated benchmark summary in docs/static/benchmarks/summary.json

Workflow Run

Run ID: 26377659640
Triggered: schedule
Date:

🤖 This PR was automatically created and will be merged automatically once CI checks pass.

codacy-production · 2026-05-25T00:59:44Z

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

_{NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer}
_{TIP This summary will be updated as you push new changes.}

claude

Benchmark Data Update Review

This is a purely automated documentation/data PR — no source code changes, 18 docs files updated. The JSON and Markdown files are internally consistent with each other. ✅

⚠️ Notable Observation: DataDrivenTests Anomalous Regression

The DataDrivenTests benchmark shows a large across-the-board regression for all frameworks simultaneously:

Framework	Yesterday	Today	Change
TUnit	362.98 ms	497.35 ms	+37%
NUnit	447.80 ms	664.80 ms	+48%
MSTest	395.63 ms	673.15 ms	+70%
xUnit3	469.96 ms	695.56 ms	+48%
TUnit AOT	23.30 ms	28.08 ms	+21%

The xUnit3 StdDev also jumped from 3.49ms → 31.06ms, which is a classic sign of a noisy run. When all frameworks regress proportionally in a single category, this almost always indicates CI environment noise (shared runner under load, thermal throttling, I/O contention) rather than a real performance change. All other benchmark categories (AsyncTests, MassiveParallelTests, etc.) show only minor fluctuations (~1-3%), which is normal variance.

Suggestion: Consider whether the automated pipeline should include a heuristic to detect and flag runs where all frameworks in a single category diverge from historical mean by >15% — rather than silently publishing potentially misleading results. This could prevent readers from interpreting this day's DataDrivenTests data as a genuine regression.

Minor: historical.json Rolling Window

The 2026-02-27 entry was dropped from when today's entry was added. If this is an intentional rolling window (keep last N days), that's fine — just worth confirming the window size is documented somewhere.

Summary

No correctness issues with the data update itself. The DataDrivenTests numbers look like benchmark noise and should not be interpreted as a real regression. The rest of the data looks clean.

claude

Benchmark Data Update Review

This is a purely automated documentation/data PR — no source code changes, 18 docs files updated. The JSON and Markdown files are internally consistent with each other. ✅

⚠️ Notable Observation: DataDrivenTests Anomalous Regression

The DataDrivenTests benchmark shows a large across-the-board regression for all frameworks simultaneously:

Framework	Yesterday	Today	Change
TUnit	362.98 ms	497.35 ms	+37%
NUnit	447.80 ms	664.80 ms	+48%
MSTest	395.63 ms	673.15 ms	+70%
xUnit3	469.96 ms	695.56 ms	+48%
TUnit AOT	23.30 ms	28.08 ms	+21%

The xUnit3 StdDev also jumped from 3.49 ms to 31.06 ms, a classic sign of a noisy run. When all frameworks regress proportionally in a single category, this almost always indicates CI environment noise (shared runner under load, thermal throttling, I/O contention) rather than a real performance change. All other benchmark categories (AsyncTests, MassiveParallelTests, etc.) show only minor fluctuations (~1-3%), which is normal variance.

Suggestion: Consider whether the automated pipeline should include a heuristic to detect and flag runs where all frameworks in a single category diverge from the historical mean by >15% — rather than silently publishing potentially misleading results. This would prevent readers from misinterpreting this day's DataDrivenTests data as a genuine regression.

Minor: historical.json Rolling Window

The 2026-02-27 entry was dropped from historical.json when today's entry was added. If this is an intentional rolling window (keep last N days), that's fine — worth confirming the window size is documented somewhere so the behavior is clear.

Summary

No correctness issues with the data update itself. The DataDrivenTests numbers look like benchmark noise and should not be interpreted as a real regression. All other data looks clean.

chore: update benchmark results

9f34bce

thomhurst added automated benchmarks documentation Improvements or additions to documentation ignore-for-release labels May 25, 2026

thomhurst temporarily deployed to Pull Requests May 25, 2026 00:59 — with GitHub Actions Inactive

thomhurst merged commit ab59d5c into main May 25, 2026
12 of 13 checks passed

thomhurst deleted the automated-benchmarks-update branch May 25, 2026 00:59

claude Bot reviewed May 25, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

🤖 Update Benchmark Results#6009

🤖 Update Benchmark Results#6009
thomhurst merged 1 commit into
mainfrom
automated-benchmarks-update

thomhurst commented May 25, 2026

Uh oh!

Uh oh!

codacy-production Bot commented May 25, 2026

Uh oh!

claude Bot left a comment

Uh oh!

claude Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

thomhurst commented May 25, 2026

Automated Benchmark Update

Benchmarks Produced

Runtime Benchmarks

Build Benchmarks

Changes

Workflow Run

Uh oh!

Uh oh!

codacy-production Bot commented May 25, 2026

Up to standards ✅

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Benchmark Data Update Review

⚠️ Notable Observation: DataDrivenTests Anomalous Regression

Minor: historical.json Rolling Window

Summary

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Benchmark Data Update Review

⚠️ Notable Observation: DataDrivenTests Anomalous Regression

Minor: historical.json Rolling Window

Summary

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant