GitHub Action for Automated Root Cause Analysis (RCA) on Bug Closure #30834

vpintorico · 2025-03-06T16:13:12Z

What is this about?

We want to improve our RCA process for high-severity bugs (Sev0 or Sev1) by leveraging automation. Currently, RCA requires manual effort and can be time-consuming. Implementing a GitHub Action to trigger a brief, 10-minute decision-tree questionnaire upon closure of a Sev0 or Sev1 bug will help us:

Consistently capture the root cause and contributing factors.
Quickly identify preventive measures and lessons learned.
Automatically aggregate findings to inform continuous improvement initiatives over time.

Scenario

As a delivery team,
I want an automated, short Root Cause Analysis (RCA) questionnaire to be triggered when a Sev0 or Sev1 bug is closed,
so that we can efficiently capture root causes, aggregate insights, and identify actionable improvements over time with minimal overhead.

Design

Questionnaire

Technical Details

No response

Threat Modeling Framework

No response

Acceptance Criteria

Automated Trigger: The RCA questionnaire is automatically triggered when a Sev0 or Sev1 GitHub issue is closed.

Visibility & Insights: Stakeholders should be able to analyze RCA trends and track improvement actions.

Data Aggregation:

All questionnaire responses must be stored in a manner that can be easily aggregated and queried.
The system should allow exporting or viewing aggregated results in a dashboard or report (e.g., grafana).

Access Control: Only MM team members can complete the or review questionnaires.

Minimal Disruption: Closing a Sev0/Sev1 bug should not be blocked indefinitely by the questionnaire but should encourage timely completion.

Error Handling & Notifications:

The team should be alerted if the GitHub Action fails.
Users should be reminded of pending questionnaires.
Add RCA-needed label to Sev0 or Sev1 bugs.
Add RCA-completed after questionary completion.

Stakeholder review needed before the work gets merged

Engineering (needed in most cases)
Design
Product
QA (automation tests are required to pass before merging PRs but not all changes are covered by automation tests - please review if QA is needed beyond automation tests)
Security
Legal
Marketing
Management (please specify)
Other (please specify)

References

No response

bsuv · 2025-03-11T14:20:14Z

@sethkfman @vpintorico we've created a task for solution design first to clarify some of the constraints listed above: https://consensyssoftware.atlassian.net/browse/INFRA-2406

vpintorico added the team-dev-ops label Mar 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GitHub Action for Automated Root Cause Analysis (RCA) on Bug Closure #30834

GitHub Action for Automated Root Cause Analysis (RCA) on Bug Closure #30834

vpintorico commented Mar 6, 2025 •

edited

Loading

bsuv commented Mar 11, 2025

GitHub Action for Automated Root Cause Analysis (RCA) on Bug Closure #30834

GitHub Action for Automated Root Cause Analysis (RCA) on Bug Closure #30834

Comments

vpintorico commented Mar 6, 2025 • edited Loading

What is this about?

Scenario

Design

Technical Details

Threat Modeling Framework

Acceptance Criteria

Stakeholder review needed before the work gets merged

References

bsuv commented Mar 11, 2025

vpintorico commented Mar 6, 2025 •

edited

Loading