Skip to content

feat(html-report): add flaky test detection and summary section#5498

Merged
thomhurst merged 5 commits intomainfrom
feat/html-report-flaky-test-detection
Apr 11, 2026
Merged

feat(html-report): add flaky test detection and summary section#5498
thomhurst merged 5 commits intomainfrom
feat/html-report-flaky-test-detection

Conversation

@thomhurst
Copy link
Copy Markdown
Owner

Summary

  • Adds flaky test detection to the HTML report — a test is considered flaky when it has status === 'passed' and retryAttempt > 0
  • Adds a collapsible "Flaky Tests" quick-access section (alongside Failed and Slowest) showing test name, class, retry count, and click-to-scroll navigation
  • Adds an orange "Flaky" filter pill to the status bar (auto-hidden when no flaky tests exist) and excludes flaky tests from the "Passed" filter for clearer separation
  • Adds a flaky count indicator in the summary dashboard and full orange color theming (CSS variables, badge, dot, light/dark mode support)

Closes #5487

@codacy-production
Copy link
Copy Markdown

codacy-production bot commented Apr 10, 2026

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

🟢 Metrics 4 complexity

Metric Results
Complexity 4

View in Codacy

TIP This summary will be updated as you push new changes. Give us feedback

Add dedicated flaky test support to the HTML report:
- Flaky detection: tests with status 'passed' and retryAttempt > 0
- New collapsible 'Flaky Tests' quick-access section with retry counts
- Orange 'Flaky' filter pill in the status filter bar (hidden when no flaky tests)
- Flaky count indicator in the summary dashboard
- Orange color theme variables and badge styling for flaky status
- Exclude flaky tests from the 'Passed' filter for clearer separation

Closes #5487
@thomhurst thomhurst force-pushed the feat/html-report-flaky-test-detection branch from a352a23 to 8e14a19 Compare April 10, 2026 21:05
Copy link
Copy Markdown
Contributor

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review: feat(html-report): flaky test detection and summary section

Overall this is a well-executed, self-contained feature. The implementation is clean and consistent with existing patterns in the file. A few things worth flagging:


Bug: "Passed" pill count includes flaky tests, but the filter excludes them

The server-side summary.Passed value (rendered into the HTML at generation time) counts all passed tests, including flaky ones. The JS filter for passed correctly excludes flaky tests — but the count shown in the pill doesn't match what's visible after filtering.

// AppendSearchAndFilters — Passed pill count is baked in server-side
sb.Append(summary.Passed);  // includes flaky tests
// JS filter correctly excludes flaky from the Passed view
} else if (activeFilter === 'passed') {
    if (t.status !== 'passed' || isFlaky(t)) return false;
}

This creates a mismatch: the "Passed" badge says e.g. 42, but clicking it only shows 39 tests. The fix is to subtract the flaky count from the displayed Passed count once it's computed client-side:

// After computing flaky.length in renderFlakySection:
const passedPillCount = document.querySelector('[data-filter="passed"] .pill-count');
if (passedPillCount) {
    const originalPassed = parseInt(passedPillCount.dataset.total || passedPillCount.textContent);
    passedPillCount.dataset.total = originalPassed; // preserve original for idempotency
    passedPillCount.textContent = originalPassed - flaky.length;
}

Minor: ReportSummary has no Flaky property

The flaky count is computed entirely client-side in JS, meaning the serialized JSON summary embedded in the report doesn't carry a flaky count. If consumers ever parse the raw reportData JSON (e.g. for CI tooling or other downstream use), the flaky count won't be available.

Adding a Flaky property to ReportSummary (populated server-side based on retry semantics) would make the data model complete and consistent with Passed, Failed, etc. This isn't blocking for the HTML report use case, but worth considering for forward compatibility.


Minor: pill visibility uses inline style vs. class toggle (inconsistency)

The flakyPill uses pill.style.display = ... while flakyIndicator uses .classList.add('visible'). Both work correctly, but the file establishes the class-based pattern for toggling visibility — using it consistently for the pill too would be cleaner.


Strengths

  • isFlaky() extracted as a named helper — readable and reusable
  • Reuses existing .retry-tag, .qa-section, .qa-item CSS — no style duplication
  • esc() used on all user-supplied strings — XSS safe
  • Sort by retry count descending is the right default (most unstable tests first)
  • Dark/light mode theming via CSS variables follows established patterns
  • aria-label on the new section region — good accessibility

- Fix Passed pill count mismatch by subtracting flaky count server-side
- Add Flaky property to ReportSummary for forward compatibility
- Use class-based visibility toggle for flaky pill instead of inline style
Copy link
Copy Markdown
Contributor

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review: feat(html-report): flaky test detection and summary section

Previous Review — All Addressed ✓

All three issues from the prior review have been resolved:

  • Passed pill count: now correctly renders summary.Passed - summary.Flaky server-side ✓
  • ReportSummary.Flaky property: added with [JsonPropertyName("flaky")] and populated in AccumulateStatus
  • Pill visibility: uses classList.add/remove('hidden') correctly (class-based, not inline style) ✓

New Issue: Flaky tests render a green "passed" badge in the main test list

When a user clicks the "Flaky" filter pill, matchesFilter correctly shows only flaky tests — but the main test row renderer uses t.status for the badge class:

// HtmlReportGenerator.cs:1886
html += '<span class="t-badge '+t.status+'">'+esc(t.status)+'</span>';

Since t.status === 'passed' for flaky tests, each row gets a green "passed" badge — inconsistent with the orange "flaky" badge shown in the quick-access section above. A user filtering by Flaky will see a page full of green "passed" badges, which undermines the visual distinction this PR is building.

Suggested fix:

const badgeClass = isFlaky(t) ? 'flaky' : t.status;
html += '<span class="t-badge ' + badgeClass + '">' + esc(badgeClass) + '</span>';

This also applies to the failure-clusters renderer at line 2282, which has the same pattern:

h += '<span class="t-badge '+safeClass(f.t.status)+'">'+esc(f.t.status)+'</span>';

(Less critical there since failure clusters only show failed tests, but worth keeping consistent.)


Minor: Pass rate / ring chart inconsistency with the Passed pill

The ring chart and pass rate percentage are computed from summary.Passed (which includes flaky tests), while the Passed filter pill shows summary.Passed - summary.Flaky. This means:

  • Ring shows, say, 95% pass rate (42 tests as "passed")
  • Passed pill says 39 tests

A user might be confused why the numbers don't add up. The flaky-indicator label ("3 flaky") bridges the gap, but it's a small text element that's easy to miss. Either:

  • Adjust the ring to use Passed - Flaky for the green arc and add an orange arc for flaky (more visual clarity), or
  • Add a note to the ring center like "pass rate (incl. flaky)" to explicitly acknowledge the convention

The current approach isn't wrong — flaky tests did pass — but the pill/ring mismatch may prompt user confusion.


Strengths

The overall implementation is solid:

  • isFlaky() extracted as a named, reusable helper — used consistently across matchesFilter, renderFlakySection, and the row renderer
  • CSS variables follow the established pattern (--orange, --orange-d, @property registration, dark/light mode)
  • XSS safety: esc() is applied everywhere user-facing strings are interpolated
  • retryAttempt was already serialized into the JSON data model, so no schema additions needed for JS to work
  • .retry-tag already existed in CSS — smart reuse rather than duplication
  • Server-side and client-side flaky definitions are identical (passed + retryAttempt > 0), so counts will always agree

- Add badgeLabel() JS helper to show 'flaky' badge instead of 'passed'
  for flaky tests in main list, failed section, and failure clusters
- Split ring chart green arc into green (clean pass) + orange (flaky)
- Update Passed stat card to show clean passes only (excluding flaky)
- Simplify AccumulateStatus to accept ReportTestResult directly
- Use classList.toggle for flaky pill visibility
Copy link
Copy Markdown
Contributor

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review: feat(html-report): flaky test detection and summary section (Round 3)

Previous Review — All Addressed ✓

All issues from the second review have been resolved:

  • Green "passed" badge for flaky tests: fixed with the new badgeLabel() helper applied consistently across render(), renderFailedSection(), and renderFailureClusters()
  • Ring chart inconsistency: green arc now represents clean passes only; orange arc added for flaky; the split correctly preserves the total arc length ✓
  • Passed stat card: now shows cleanPassed (excluding flaky) ✓

Remaining Issue: Stat cards no longer sum to Total

With flaky tests excluded from the "Passed" card, the stat cards are now mathematically inconsistent:

Total = cleanPassed + Flaky + Failed + TimedOut + Skipped + Cancelled

But the displayed stat cards are:

Total | Passed (clean) | Failed | Skipped | Cancelled

For example with 3 flaky tests:

  • Total: 50
  • Passed: 39 + Failed: 5 + Skipped: 3 + Cancelled: 0 = 47 ≠ 50

The 3 missing tests are accounted for only by the small flaky-indicator div beneath the stats. A user is likely to notice the arithmetic doesn't work and be confused. The flaky-indicator is easy to overlook — it's an <div> styled with .78rem text versus the prominent stat card layout.

Suggested fix: Add a proper "Flaky" stat card to keep the stat row self-consistent:

AppendStatCard(sb, "passed", cleanPassed.ToString(), "Passed", "var(--emerald)");
AppendStatCard(sb, "flaky", summary.Flaky.ToString(), "Flaky", "var(--orange)");   // add this
AppendStatCard(sb, "failed", (summary.Failed + summary.TimedOut).ToString(), "Failed", "var(--rose)");

Then the flaky-indicator div can be removed entirely since it would be redundant. The stat card also provides a click-to-filter entry point (same as the other cards if wired up), and is consistent with how other statuses are treated.


Minor: passRate includes flaky tests but ring green arc does not

var passRate = summary.Total > 0 ? (double)summary.Passed / summary.Total * 100 : 0;
// rendered as the ring center % label

summary.Passed includes flaky tests, so the ring center shows e.g. "95% pass rate" computed from Passed / Total — but the green arc of the ring now represents only cleanPassed / Total. The orange arc covers the gap, so a careful reader can reconcile them. This is a minor inconsistency — either computing passRate from cleanPassed or updating the label to "pass rate (incl. flaky)" would be cleaner — but it's not blocking.


Strengths

The implementation continues to be solid:

  • badgeLabel() is clean and now applied uniformly in all three rendering contexts
  • The split ring chart gives flaky tests a distinct visual weight without requiring a tooltip or key to interpret
  • AccumulateStatus switch uses the correct C# when guard ordering — more specific case "passed" when retryAttempt > 0 before plain case "passed" is correct
  • cleanPassed = summary.Passed - summary.Flaky is mathematically guaranteed safe (every flaky test also increments Passed, so Flaky ≤ Passed always holds)
  • Server-side and client-side flaky definitions remain identical (status == "passed" && retryAttempt > 0), so counts are always in sync

- Add Flaky stat card so stat row sums to Total (hidden when 0 flaky)
- Remove redundant flaky-indicator div (stat card replaces it)
- Fix pass rate % to exclude flaky tests (consistent with green arc)
@thomhurst thomhurst enabled auto-merge (squash) April 11, 2026 00:35
Copy link
Copy Markdown
Contributor

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review: feat(html-report): flaky test detection and summary section (Round 4)

Previous Review — All Addressed ✓

All issues from the third review have been resolved:

  • Stat cards no longer sum to Total: Fixed by adding a conditional Flaky stat card when summary.Flaky > 0, ensuring cleanPassed + Flaky + Failed + Skipped + Cancelled = Total ✓
  • passRate includes flaky but ring doesn't: Fixed — passRate now uses cleanPassed / Total, so the ring centre label matches the green arc's proportion ✓

Implementation is Clean and Complete

Looking at the final diff, the key invariants all hold:

Server-side consistency:

  • cleanPassed = summary.Passed - summary.Flaky is always safe (Flaky ≤ Passed by construction in AccumulateStatus)
  • Flaky stat card is shown only when there are flaky tests — sensible UX choice (avoids a "0 Flaky" card polluting zero-flaky runs)
  • C# switch ordering is correct: case "passed" when retryAttempt > 0 before plain case "passed"

Client-side consistency:

  • isFlaky(t) and server-side flaky definition are identical — counts always agree
  • badgeLabel() applied in all three render contexts: render(), renderFailedSection(), renderFailureClusters()
  • flakyPill starts hidden in server-rendered HTML; renderFlakySection() toggles it client-side — correct separation of concerns

One Minor Observation (Non-blocking)

The sort in renderFlakySection uses only retryAttempt as the key:

flaky.sort(function(a,b){ return b.t.retryAttempt - a.t.retryAttempt; });

Tests sharing the same retry count will appear in whatever order groups happens to be in — non-deterministic across runs. A secondary sort by displayName would make the list stable and easier to scan when many tests share the same retry count, but this is purely cosmetic and not blocking.


Overall Assessment

The PR is in excellent shape. Each round of feedback was addressed cleanly and incrementally. The feature is well-scoped, XSS-safe throughout (esc() on all user-facing strings), follows all existing CSS/JS conventions, and leaves the data model in a consistent state for any downstream consumers of the JSON. Ready to merge.

- Add CleanPassed and TotalFailed to ReportSummary, replacing 5 inline
  computations across the generator
- Revert pass rate to include flaky tests (they did pass)
Copy link
Copy Markdown
Contributor

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up Code Review: feat(html-report): flaky test detection and summary section

All three issues raised in the previous review have been addressed. Here's the status:


Previous issues — all resolved

Bug: Passed pill count mismatch — Fixed. The server-side C# now uses summary.CleanPassed (i.e. Passed - Flaky) when rendering the Passed pill count, so the displayed count matches what the JS filter shows. The CleanPassed computed property in ReportSummary is the right place for this.

Missing Flaky property on ReportSummary — Fixed. Flaky is now a proper [JsonPropertyName("flaky")] serialized property populated by AccumulateStatus. This means downstream consumers of the raw JSON data will see the flaky count alongside all other summary fields.

Inline style vs. class-based visibility toggle — Fixed. The pill now uses pill.classList.toggle('hidden', !flaky.length) consistently with the rest of the file.


New observations in this revision

The flaky pill starts hidden server-side with class="pill hidden", but renderFlakySection() re-computes and shows it client-side. This is a reasonable approach but note a subtle edge case: the flaky pill is always rendered as hidden in the HTML regardless of whether summary.Flaky > 0. This means:

  1. If a user loads the report with JS disabled, flaky tests are never surfaced via the pill (though the stat card and section heading still appear).
  2. More practically: the server already knows summary.Flaky at generation time, so the pill visibility could be set server-side by conditionally omitting the hidden class when summary.Flaky > 0. This would remove the need for renderFlakySection() to manipulate the pill DOM at all, giving cleaner separation of concerns. That said, renderFlakySection() needs to run regardless to populate the section body, so the current approach is perfectly functional.

This is a minor architectural point, not a blocker. The current implementation is consistent and correct.

AccumulateStatus refactor is clean. Passing ReportTestResult instead of just status is the right design — the switch with a when guard for the flaky case is readable and correct.

CleanPassed / TotalFailed as [JsonIgnore] computed properties is the right choice. These are derived values and don't need to be in the JSON payload.


Summary

The implementation is solid. All previous feedback was acted on promptly and correctly. The feature is consistent with the report's existing design patterns — isFlaky() and badgeLabel() helpers are well-placed, the CSS variables follow the established naming convention, and the ring chart segmentation is logical. Good work.

@thomhurst thomhurst merged commit 7c1857a into main Apr 11, 2026
15 checks passed
@thomhurst thomhurst deleted the feat/html-report-flaky-test-detection branch April 11, 2026 01:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

HTML Report: Add flaky test detection and summary section

1 participant