Skip to content

HTML Report: Add flaky test detection and summary section #5487

@thomhurst

Description

@thomhurst

Summary

When a test passes after one or more retry attempts (retryAttempt > 0), it's a flaky test that happened to succeed. The report already renders a small "retry #N" badge on these tests, but there's no aggregated view or dedicated filter for flaky tests. Users tracking test suite health need a way to quickly identify and triage flakiness.

Proposed Behavior

Flaky detection

A test is considered flaky if it has status === 'passed' and retryAttempt > 0. Tests that failed on all retries are just failures, not flaky — they're already covered by the Failed filter.

Quick-access section

Add a "Flaky Tests (N)" collapsible section alongside the existing "Failed Tests" and "Top 10 Slowest Tests" sections. Each entry shows:

  • Test name and class
  • Number of retry attempts it took to pass
  • A click-to-scroll link to the test in the main list

Only rendered when flaky tests exist.

Filter pill

Add a "Flaky" filter pill (with a distinct color — e.g. orange or purple to differentiate from the amber "Skipped" pill) to the status filter bar. Clicking it shows only flaky tests.

Summary dashboard

Include the flaky count in the summary stats area (e.g. a small "N flaky" indicator below or beside the pass rate), so it's visible at a glance without scrolling.

Why This Matters

Flaky tests erode confidence in CI. A green build that required 3 retries to get there isn't truly green. Today, users have to manually scan for retry badges buried inside test groups. A dedicated flaky summary makes this a first-class signal, helping teams:

  • Identify which tests need stabilization
  • Track flakiness trends over time
  • Distinguish "reliably passing" from "eventually passing"

Implementation Notes

  • Data is already available: ReportTestResult.RetryAttempt is populated in the report JSON.
  • The flaky filter can be implemented as a new data-filter="flaky" value in the existing filterBtns pill group, with matchesFilter() extended to check t.status === 'passed' && t.retryAttempt > 0.
  • The quick-access section follows the same pattern as renderFailedSection() and renderSlowestSection().
  • The flaky count for the pill badge can be computed during the initial data scan in the JS IIFE.
  • If no tests are flaky, the pill and section should be hidden entirely — no noise for suites without retries.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions