Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GitHub Action for Automated Root Cause Analysis (RCA) on Bug Closure #30834

Open
9 tasks
vpintorico opened this issue Mar 6, 2025 · 1 comment
Open
9 tasks
Labels
team-dev-ops DevOps team

Comments

@vpintorico
Copy link

vpintorico commented Mar 6, 2025

What is this about?

We want to improve our RCA process for high-severity bugs (Sev0 or Sev1) by leveraging automation. Currently, RCA requires manual effort and can be time-consuming. Implementing a GitHub Action to trigger a brief, 10-minute decision-tree questionnaire upon closure of a Sev0 or Sev1 bug will help us:

  • Consistently capture the root cause and contributing factors.
  • Quickly identify preventive measures and lessons learned.
  • Automatically aggregate findings to inform continuous improvement initiatives over time.

Scenario

As a delivery team,
I want an automated, short Root Cause Analysis (RCA) questionnaire to be triggered when a Sev0 or Sev1 bug is closed,
so that we can efficiently capture root causes, aggregate insights, and identify actionable improvements over time with minimal overhead.

Design

Questionnaire

Technical Details

No response

Threat Modeling Framework

No response

Acceptance Criteria

Automated Trigger: The RCA questionnaire is automatically triggered when a Sev0 or Sev1 GitHub issue is closed.

Visibility & Insights: Stakeholders should be able to analyze RCA trends and track improvement actions.

Data Aggregation:

  • All questionnaire responses must be stored in a manner that can be easily aggregated and queried.
  • The system should allow exporting or viewing aggregated results in a dashboard or report (e.g., grafana).

Access Control: Only MM team members can complete the or review questionnaires.

Minimal Disruption: Closing a Sev0/Sev1 bug should not be blocked indefinitely by the questionnaire but should encourage timely completion.

Error Handling & Notifications:

  • The team should be alerted if the GitHub Action fails.
  • Users should be reminded of pending questionnaires.
  • Add RCA-needed label to Sev0 or Sev1 bugs.
  • Add RCA-completed after questionary completion.

Stakeholder review needed before the work gets merged

  • Engineering (needed in most cases)
  • Design
  • Product
  • QA (automation tests are required to pass before merging PRs but not all changes are covered by automation tests - please review if QA is needed beyond automation tests)
  • Security
  • Legal
  • Marketing
  • Management (please specify)
  • Other (please specify)

References

No response

@vpintorico vpintorico added the team-dev-ops DevOps team label Mar 6, 2025
@bsuv
Copy link

bsuv commented Mar 11, 2025

@sethkfman @vpintorico we've created a task for solution design first to clarify some of the constraints listed above: https://consensyssoftware.atlassian.net/browse/INFRA-2406

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team-dev-ops DevOps team
Projects
None yet
Development

No branches or pull requests

2 participants