New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sb 162896935 dedupe alerts for broken master #1579

Closed
wants to merge 3 commits into
base: master
from

Conversation

2 participants
@sarboc
Copy link
Contributor

sarboc commented Jan 10, 2019

Description

Group similar alerts so we don't get paged so much.

Reviewer Notes

I made the decision to specifically use the $CIRCLE_JOB in our dedupe key, rather than the branch name. This means that alerts that fail in the same part of the test/build process will be grouped together, but failures that happen in different parts of the process will create separate alerts. This made the most sense to me because a) we suppress all alerts for branches other than master, and b) if one merge is breaking acceptance tests, but another is breaking migrations, that might be good to know.

Test results are available in slack history, plus I've included a screenshot from PagerDuty below.

Code Review Verification Steps

  • Request review from a member of a different team.
  • Have the Pivotal acceptance criteria been met for this change?

References

Screenshots

Grouping in pagerduty:
screen shot 2019-01-09 at 6 33 25 pm

sarboc added some commits Jan 10, 2019

Revert "Test changes"
This reverts commit 4517e6e.

@sarboc sarboc requested review from chrisgilmerproj and mikena-truss Jan 10, 2019

@@ -59,7 +59,7 @@ cat <<EOM
"class": "cicd failure"
},
"routing_key": "$PD_ROUTING_KEY",
"dedup_key": "circle-$CIRCLE_WORKFLOW_ID",
"dedup_key": "circle-$CIRCLE_JOB",

This comment has been minimized.

@chrisgilmerproj

chrisgilmerproj Jan 10, 2019

Contributor

We should chat about the difference between CIRCLE_WORKFLOW_ID and CIRCLE_JOB from the Built-in Environment Variables. I recently changed it to CIRCLE_WORKFLOW_ID in c9ad97e from CIRCLE_BUILD_NUM.

My thinking was that we really want all alerts from a single workflow to end up in the same place so that concurrent jobs won't create multiple alerts. For instance, if build_tools and acceptance_tests both fail I'd expect only one alert to PD instead of two. The CIRCLE_WORKFLOW_ID accomplishes that, whereas CIRCLE_JOB would send out two alerts. I think you can test this by adding run: exit 1 to two concurrent jobs in the .circleci/config.yml.

However, I'm not against having it alert separately for each job if there's a concern we'd miss something and that multiple alerts would be useful. I just want to point out that CIRCLE_JOB and CIRCLE_BUILD_NUM are nearly equivalent, which is what the problem was originally.

This comment has been minimized.

@chrisgilmerproj

chrisgilmerproj Jan 10, 2019

Contributor

Ahh, I think I see the other thing you did here, which is to dedup across workflows. I'm not sure how I feel about that. Would like to discuss.

@sarboc

This comment has been minimized.

Copy link
Contributor

sarboc commented Jan 10, 2019

@chrisgilmerproj and I had different understandings of how we were trying to dedupe the alerts, and his work pre-empted mine. I'm going to close this PR for now, and we're going to try and find other ways to make pages more manageable.

@sarboc sarboc closed this Jan 10, 2019

@sarboc sarboc deleted the sb_162896935_dedupe_alerts_for_broken_master branch Jan 10, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment