Skip to content

Commit

Permalink
test/e2e/upgrade/alert: Temporarily allow HighOverallControlPlaneCPU
Browse files Browse the repository at this point in the history
We've allowed these for non-update jobs since 12b022c (allow high
CPU alerts to be firing and pending, 2021-04-26, openshift#26102).  But they
show up in update jobs too.  For example [1] included:

  alert ExtremelyHighIndividualControlPlaneCPU fired for 60 seconds with labels: {instance="ci-op-vjm670pq-1ff06-pn8bq-master-1", severity="critical"}
  alert HighOverallControlPlaneCPU fired for 240 seconds with labels: {severity="warning"}

Searching for recent frequency:

  $ w3m -dump -cols 200 'https://search.ci.openshift.org/?maxAge=24h&type=junit&search=alert+.*High.*ControlPlaneCPU+fired+for' | grep 'failures match' | sort
  periodic-ci-openshift-release-master-ci-4.9-e2e-gcp-upgrade (all) - 49 runs, 65% failed, 3% of failures match = 2% impact
  periodic-ci-openshift-release-master-ci-4.9-upgrade-from-stable-4.8-e2e-gcp-upgrade (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
  pull-ci-openshift-ovn-kubernetes-master-e2e-gcp-ovn-upgrade (all) - 6 runs, 100% failed, 17% of failures match = 17% impact
  release-openshift-ocp-installer-upgrade-remote-libvirt-ppc64le-4.7-to-4.8 (all) - 2 runs, 100% failed, 50% of failures match = 50% impact

I don't know why this would be GCP-specific, but copy/pasting in a
Matches block I found elsewhere in the file to limit the exception to
GCP.

[1]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.9-upgrade-from-stable-4.8-e2e-gcp-upgrade/1417199789052792832
  • Loading branch information
wking committed Jul 23, 2021
1 parent 539b3ba commit d60209c
Showing 1 changed file with 8 additions and 0 deletions.
8 changes: 8 additions & 0 deletions test/e2e/upgrade/alert/alert.go
Expand Up @@ -106,6 +106,14 @@ func (t *UpgradeTest) Test(f *framework.Framework, done <-chan struct{}, upgrade
Selector: map[string]string{"alertname": "HighlyAvailableWorkloadIncorrectlySpread", "namespace": "openshift-monitoring", "workload": "alertmanager-main"},
Text: "https://bugzilla.redhat.com/show_bug.cgi?id=1955489",
},
{
// Should be removed one release after the attached bugzilla is fixed, or after that bug is fixed in a backport to the previous minor.
Selector: map[string]string{"alertname": "HighOverallControlPlaneCPU"},
Text: "https://bugzilla.redhat.com/show_bug.cgi?id=1985073",
Matches: func(_ *model.Sample) bool {
return framework.ProviderIs("gce")
},
},
}

pendingAlertsWithBugs := helper.MetricConditions{
Expand Down

0 comments on commit d60209c

Please sign in to comment.