feat(html-report): add flaky test detection and summary section by thomhurst · Pull Request #5498 · thomhurst/TUnit

thomhurst · 2026-04-10T19:05:04Z

Summary

Adds flaky test detection to the HTML report — a test is considered flaky when it has status === 'passed' and retryAttempt > 0
Adds a collapsible "Flaky Tests" quick-access section (alongside Failed and Slowest) showing test name, class, retry count, and click-to-scroll navigation
Adds an orange "Flaky" filter pill to the status bar (auto-hidden when no flaky tests exist) and excludes flaky tests from the "Passed" filter for clearer separation
Adds a flaky count indicator in the summary dashboard and full orange color theming (CSS variables, badge, dot, light/dark mode support)

Closes #5487

codacy-production · 2026-04-10T19:07:36Z

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

🟢 Metrics 4 complexity

Metric Results

Complexity 4

View in Codacy

_{TIP This summary will be updated as you push new changes. Give us feedback}

Add dedicated flaky test support to the HTML report: - Flaky detection: tests with status 'passed' and retryAttempt > 0 - New collapsible 'Flaky Tests' quick-access section with retry counts - Orange 'Flaky' filter pill in the status filter bar (hidden when no flaky tests) - Flaky count indicator in the summary dashboard - Orange color theme variables and badge styling for flaky status - Exclude flaky tests from the 'Passed' filter for clearer separation Closes #5487

claude

Code Review: feat(html-report): flaky test detection and summary section

Overall this is a well-executed, self-contained feature. The implementation is clean and consistent with existing patterns in the file. A few things worth flagging:

Bug: "Passed" pill count includes flaky tests, but the filter excludes them

The server-side summary.Passed value (rendered into the HTML at generation time) counts all passed tests, including flaky ones. The JS filter for passed correctly excludes flaky tests — but the count shown in the pill doesn't match what's visible after filtering.

// AppendSearchAndFilters — Passed pill count is baked in server-side
sb.Append(summary.Passed);  // includes flaky tests

// JS filter correctly excludes flaky from the Passed view
} else if (activeFilter === 'passed') {
    if (t.status !== 'passed' || isFlaky(t)) return false;
}

This creates a mismatch: the "Passed" badge says e.g. 42, but clicking it only shows 39 tests. The fix is to subtract the flaky count from the displayed Passed count once it's computed client-side:

// After computing flaky.length in renderFlakySection:
const passedPillCount = document.querySelector('[data-filter="passed"] .pill-count');
if (passedPillCount) {
    const originalPassed = parseInt(passedPillCount.dataset.total || passedPillCount.textContent);
    passedPillCount.dataset.total = originalPassed; // preserve original for idempotency
    passedPillCount.textContent = originalPassed - flaky.length;
}

Minor: `ReportSummary` has no `Flaky` property

The flaky count is computed entirely client-side in JS, meaning the serialized JSON summary embedded in the report doesn't carry a flaky count. If consumers ever parse the raw reportData JSON (e.g. for CI tooling or other downstream use), the flaky count won't be available.

Adding a Flaky property to ReportSummary (populated server-side based on retry semantics) would make the data model complete and consistent with Passed, Failed, etc. This isn't blocking for the HTML report use case, but worth considering for forward compatibility.

Minor: pill visibility uses inline style vs. class toggle (inconsistency)

The flakyPill uses pill.style.display = ... while flakyIndicator uses .classList.add('visible'). Both work correctly, but the file establishes the class-based pattern for toggling visibility — using it consistently for the pill too would be cleaner.

Strengths

isFlaky() extracted as a named helper — readable and reusable
Reuses existing .retry-tag, .qa-section, .qa-item CSS — no style duplication
esc() used on all user-supplied strings — XSS safe
Sort by retry count descending is the right default (most unstable tests first)
Dark/light mode theming via CSS variables follows established patterns
aria-label on the new section region — good accessibility

- Fix Passed pill count mismatch by subtracting flaky count server-side - Add Flaky property to ReportSummary for forward compatibility - Use class-based visibility toggle for flaky pill instead of inline style

claude

Code Review: feat(html-report): flaky test detection and summary section

Previous Review — All Addressed ✓

All three issues from the prior review have been resolved:

Passed pill count: now correctly renders summary.Passed - summary.Flaky server-side ✓
ReportSummary.Flaky property: added with [JsonPropertyName("flaky")] and populated in AccumulateStatus ✓
Pill visibility: uses classList.add/remove('hidden') correctly (class-based, not inline style) ✓

New Issue: Flaky tests render a green "passed" badge in the main test list

When a user clicks the "Flaky" filter pill, matchesFilter correctly shows only flaky tests — but the main test row renderer uses t.status for the badge class:

// HtmlReportGenerator.cs:1886
html += '<span class="t-badge '+t.status+'">'+esc(t.status)+'</span>';

Since t.status === 'passed' for flaky tests, each row gets a green "passed" badge — inconsistent with the orange "flaky" badge shown in the quick-access section above. A user filtering by Flaky will see a page full of green "passed" badges, which undermines the visual distinction this PR is building.

Suggested fix:

const badgeClass = isFlaky(t) ? 'flaky' : t.status;
html += '<span class="t-badge ' + badgeClass + '">' + esc(badgeClass) + '</span>';

This also applies to the failure-clusters renderer at line 2282, which has the same pattern:

h += '<span class="t-badge '+safeClass(f.t.status)+'">'+esc(f.t.status)+'</span>';

(Less critical there since failure clusters only show failed tests, but worth keeping consistent.)

Minor: Pass rate / ring chart inconsistency with the Passed pill

The ring chart and pass rate percentage are computed from summary.Passed (which includes flaky tests), while the Passed filter pill shows summary.Passed - summary.Flaky. This means:

Ring shows, say, 95% pass rate (42 tests as "passed")
Passed pill says 39 tests

A user might be confused why the numbers don't add up. The flaky-indicator label ("3 flaky") bridges the gap, but it's a small text element that's easy to miss. Either:

Adjust the ring to use Passed - Flaky for the green arc and add an orange arc for flaky (more visual clarity), or
Add a note to the ring center like "pass rate (incl. flaky)" to explicitly acknowledge the convention

The current approach isn't wrong — flaky tests did pass — but the pill/ring mismatch may prompt user confusion.

Strengths

The overall implementation is solid:

isFlaky() extracted as a named, reusable helper — used consistently across matchesFilter, renderFlakySection, and the row renderer
CSS variables follow the established pattern (--orange, --orange-d, @property registration, dark/light mode)
XSS safety: esc() is applied everywhere user-facing strings are interpolated
retryAttempt was already serialized into the JSON data model, so no schema additions needed for JS to work
.retry-tag already existed in CSS — smart reuse rather than duplication
Server-side and client-side flaky definitions are identical (passed + retryAttempt > 0), so counts will always agree

- Add badgeLabel() JS helper to show 'flaky' badge instead of 'passed' for flaky tests in main list, failed section, and failure clusters - Split ring chart green arc into green (clean pass) + orange (flaky) - Update Passed stat card to show clean passes only (excluding flaky) - Simplify AccumulateStatus to accept ReportTestResult directly - Use classList.toggle for flaky pill visibility

claude

Code Review: feat(html-report): flaky test detection and summary section (Round 3)

Previous Review — All Addressed ✓

All issues from the second review have been resolved:

Green "passed" badge for flaky tests: fixed with the new badgeLabel() helper applied consistently across render(), renderFailedSection(), and renderFailureClusters() ✓
Ring chart inconsistency: green arc now represents clean passes only; orange arc added for flaky; the split correctly preserves the total arc length ✓
Passed stat card: now shows cleanPassed (excluding flaky) ✓

Remaining Issue: Stat cards no longer sum to Total

With flaky tests excluded from the "Passed" card, the stat cards are now mathematically inconsistent:

Total = cleanPassed + Flaky + Failed + TimedOut + Skipped + Cancelled

But the displayed stat cards are:

Total | Passed (clean) | Failed | Skipped | Cancelled

For example with 3 flaky tests:

Total: 50
Passed: 39 + Failed: 5 + Skipped: 3 + Cancelled: 0 = 47 ≠ 50

The 3 missing tests are accounted for only by the small flaky-indicator div beneath the stats. A user is likely to notice the arithmetic doesn't work and be confused. The flaky-indicator is easy to overlook — it's an <div> styled with .78rem text versus the prominent stat card layout.

Suggested fix: Add a proper "Flaky" stat card to keep the stat row self-consistent:

AppendStatCard(sb, "passed", cleanPassed.ToString(), "Passed", "var(--emerald)");
AppendStatCard(sb, "flaky", summary.Flaky.ToString(), "Flaky", "var(--orange)");   // add this
AppendStatCard(sb, "failed", (summary.Failed + summary.TimedOut).ToString(), "Failed", "var(--rose)");

Then the flaky-indicator div can be removed entirely since it would be redundant. The stat card also provides a click-to-filter entry point (same as the other cards if wired up), and is consistent with how other statuses are treated.

Minor: `passRate` includes flaky tests but ring green arc does not

var passRate = summary.Total > 0 ? (double)summary.Passed / summary.Total * 100 : 0;
// rendered as the ring center % label

summary.Passed includes flaky tests, so the ring center shows e.g. "95% pass rate" computed from Passed / Total — but the green arc of the ring now represents only cleanPassed / Total. The orange arc covers the gap, so a careful reader can reconcile them. This is a minor inconsistency — either computing passRate from cleanPassed or updating the label to "pass rate (incl. flaky)" would be cleaner — but it's not blocking.

Strengths

The implementation continues to be solid:

badgeLabel() is clean and now applied uniformly in all three rendering contexts
The split ring chart gives flaky tests a distinct visual weight without requiring a tooltip or key to interpret
AccumulateStatus switch uses the correct C# when guard ordering — more specific case "passed" when retryAttempt > 0 before plain case "passed" is correct
cleanPassed = summary.Passed - summary.Flaky is mathematically guaranteed safe (every flaky test also increments Passed, so Flaky ≤ Passed always holds)
Server-side and client-side flaky definitions remain identical (status == "passed" && retryAttempt > 0), so counts are always in sync

- Add Flaky stat card so stat row sums to Total (hidden when 0 flaky) - Remove redundant flaky-indicator div (stat card replaces it) - Fix pass rate % to exclude flaky tests (consistent with green arc)

claude

Code Review: feat(html-report): flaky test detection and summary section (Round 4)

Previous Review — All Addressed ✓

All issues from the third review have been resolved:

Stat cards no longer sum to Total: Fixed by adding a conditional Flaky stat card when summary.Flaky > 0, ensuring cleanPassed + Flaky + Failed + Skipped + Cancelled = Total ✓
passRate includes flaky but ring doesn't: Fixed — passRate now uses cleanPassed / Total, so the ring centre label matches the green arc's proportion ✓

Implementation is Clean and Complete

Looking at the final diff, the key invariants all hold:

Server-side consistency:

cleanPassed = summary.Passed - summary.Flaky is always safe (Flaky ≤ Passed by construction in AccumulateStatus)
Flaky stat card is shown only when there are flaky tests — sensible UX choice (avoids a "0 Flaky" card polluting zero-flaky runs)
C# switch ordering is correct: case "passed" when retryAttempt > 0 before plain case "passed"

Client-side consistency:

isFlaky(t) and server-side flaky definition are identical — counts always agree
badgeLabel() applied in all three render contexts: render(), renderFailedSection(), renderFailureClusters()
flakyPill starts hidden in server-rendered HTML; renderFlakySection() toggles it client-side — correct separation of concerns

One Minor Observation (Non-blocking)

The sort in renderFlakySection uses only retryAttempt as the key:

flaky.sort(function(a,b){ return b.t.retryAttempt - a.t.retryAttempt; });

Tests sharing the same retry count will appear in whatever order groups happens to be in — non-deterministic across runs. A secondary sort by displayName would make the list stable and easier to scan when many tests share the same retry count, but this is purely cosmetic and not blocking.

Overall Assessment

The PR is in excellent shape. Each round of feedback was addressed cleanly and incrementally. The feature is well-scoped, XSS-safe throughout (esc() on all user-facing strings), follows all existing CSS/JS conventions, and leaves the data model in a consistent state for any downstream consumers of the JSON. Ready to merge.

- Add CleanPassed and TotalFailed to ReportSummary, replacing 5 inline computations across the generator - Revert pass rate to include flaky tests (they did pass)

claude

Follow-up Code Review: feat(html-report): flaky test detection and summary section

All three issues raised in the previous review have been addressed. Here's the status:

Previous issues — all resolved

Bug: Passed pill count mismatch — Fixed. The server-side C# now uses summary.CleanPassed (i.e. Passed - Flaky) when rendering the Passed pill count, so the displayed count matches what the JS filter shows. The CleanPassed computed property in ReportSummary is the right place for this.

Missing Flaky property on ReportSummary — Fixed. Flaky is now a proper [JsonPropertyName("flaky")] serialized property populated by AccumulateStatus. This means downstream consumers of the raw JSON data will see the flaky count alongside all other summary fields.

Inline style vs. class-based visibility toggle — Fixed. The pill now uses pill.classList.toggle('hidden', !flaky.length) consistently with the rest of the file.

New observations in this revision

The flaky pill starts hidden server-side with class="pill hidden", but renderFlakySection() re-computes and shows it client-side. This is a reasonable approach but note a subtle edge case: the flaky pill is always rendered as hidden in the HTML regardless of whether summary.Flaky > 0. This means:

If a user loads the report with JS disabled, flaky tests are never surfaced via the pill (though the stat card and section heading still appear).
More practically: the server already knows summary.Flaky at generation time, so the pill visibility could be set server-side by conditionally omitting the hidden class when summary.Flaky > 0. This would remove the need for renderFlakySection() to manipulate the pill DOM at all, giving cleaner separation of concerns. That said, renderFlakySection() needs to run regardless to populate the section body, so the current approach is perfectly functional.

This is a minor architectural point, not a blocker. The current implementation is consistent and correct.

AccumulateStatus refactor is clean. Passing ReportTestResult instead of just status is the right design — the switch with a when guard for the flaky case is readable and correct.

CleanPassed / TotalFailed as [JsonIgnore] computed properties is the right choice. These are derived values and don't need to be in the JSON payload.

Summary

The implementation is solid. All previous feedback was acted on promptly and correctly. The feature is consistent with the report's existing design patterns — isFlaky() and badgeLabel() helpers are well-placed, the CSS variables follow the established naming convention, and the ring chart segmentation is logical. Good work.

thomhurst temporarily deployed to Pull Requests April 10, 2026 19:05 — with GitHub Actions Inactive

thomhurst had a problem deploying to Pull Requests April 10, 2026 19:05 — with GitHub Actions Failure

thomhurst force-pushed the feat/html-report-flaky-test-detection branch from a352a23 to 8e14a19 Compare April 10, 2026 21:05

thomhurst temporarily deployed to Pull Requests April 10, 2026 21:05 — with GitHub Actions Inactive

claude bot reviewed Apr 10, 2026

View reviewed changes

fix(html-report): address review feedback for flaky test detection

b284a89

- Fix Passed pill count mismatch by subtracting flaky count server-side - Add Flaky property to ReportSummary for forward compatibility - Use class-based visibility toggle for flaky pill instead of inline style

thomhurst had a problem deploying to Pull Requests April 11, 2026 00:24 — with GitHub Actions Error

claude bot reviewed Apr 11, 2026

View reviewed changes

thomhurst had a problem deploying to Pull Requests April 11, 2026 00:30 — with GitHub Actions Error

claude bot reviewed Apr 11, 2026

View reviewed changes

fix(html-report): add flaky stat card and fix ring chart pass rate

81bb5dc

- Add Flaky stat card so stat row sums to Total (hidden when 0 flaky) - Remove redundant flaky-indicator div (stat card replaces it) - Fix pass rate % to exclude flaky tests (consistent with green arc)

thomhurst enabled auto-merge (squash) April 11, 2026 00:35

claude bot reviewed Apr 11, 2026

View reviewed changes

refactor(html-report): add CleanPassed/TotalFailed computed properties

4db42c4

- Add CleanPassed and TotalFailed to ReportSummary, replacing 5 inline computations across the generator - Revert pass rate to include flaky tests (they did pass)

thomhurst temporarily deployed to Pull Requests April 11, 2026 00:39 — with GitHub Actions Inactive

claude bot reviewed Apr 11, 2026

View reviewed changes

thomhurst merged commit 7c1857a into main Apr 11, 2026
15 checks passed

thomhurst deleted the feat/html-report-flaky-test-detection branch April 11, 2026 01:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(html-report): add flaky test detection and summary section#5498

feat(html-report): add flaky test detection and summary section#5498
thomhurst merged 5 commits intomainfrom
feat/html-report-flaky-test-detection

thomhurst commented Apr 10, 2026

Uh oh!

codacy-production bot commented Apr 10, 2026 •

edited

Loading

Uh oh!

claude bot left a comment

Uh oh!

claude bot left a comment

Uh oh!

claude bot left a comment

Uh oh!

claude bot left a comment

Uh oh!

claude bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

thomhurst commented Apr 10, 2026

Summary

Uh oh!

codacy-production bot commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Up to standards ✅

Uh oh!

claude bot left a comment

Choose a reason for hiding this comment

Code Review: feat(html-report): flaky test detection and summary section

Bug: "Passed" pill count includes flaky tests, but the filter excludes them

Minor: ReportSummary has no Flaky property

Minor: pill visibility uses inline style vs. class toggle (inconsistency)

Strengths

Uh oh!

claude bot left a comment

Choose a reason for hiding this comment

Code Review: feat(html-report): flaky test detection and summary section

Previous Review — All Addressed ✓

New Issue: Flaky tests render a green "passed" badge in the main test list

Minor: Pass rate / ring chart inconsistency with the Passed pill

Strengths

Uh oh!

claude bot left a comment

Choose a reason for hiding this comment

Code Review: feat(html-report): flaky test detection and summary section (Round 3)

Previous Review — All Addressed ✓

Remaining Issue: Stat cards no longer sum to Total

Minor: passRate includes flaky tests but ring green arc does not

Strengths

Uh oh!

claude bot left a comment

Choose a reason for hiding this comment

Code Review: feat(html-report): flaky test detection and summary section (Round 4)

Previous Review — All Addressed ✓

Implementation is Clean and Complete

One Minor Observation (Non-blocking)

Overall Assessment

Uh oh!

claude bot left a comment

Choose a reason for hiding this comment

Follow-up Code Review: feat(html-report): flaky test detection and summary section

Previous issues — all resolved

New observations in this revision

Summary

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

codacy-production bot commented Apr 10, 2026 •

edited

Loading

Minor: `ReportSummary` has no `Flaky` property

Minor: `passRate` includes flaky tests but ring green arc does not