Skip to content

告警恢复通知,指标一直超阈值,还是会进行告警恢复的通知 #2609

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wangyi17818 opened this issue Apr 15, 2025 · 3 comments
Labels

Comments

@wangyi17818
Copy link

Question and Steps to reproduce

Image

Image

Image

Image

Relevant logs and configurations

告警条件: sum (pinglvs_packets_loss) by(target_node) / sum (pinglvs_packets_sent) by(target_node) * 100 >=90
执行频率:60
持续时长:180

启用恢复通知
留观时长:300

告警后每7分钟,产生恢复通知,之后3分钟再次产生告警

Version

v7.7.0

@wangyi17818
Copy link
Author

wangyi17818 commented Apr 15, 2025

Apr 15 17:06:22 nightingale-prod n9e[25317]: 2025-04-15 17:06:22.711058 DEBUG memsto/host_alert_rule_targets.go:93 get_targets_of_alert_rule total: 0 engine_name:default
Apr 15 17:06:26 nightingale-prod n9e[25317]: 2025-04-15 17:06:26.296188 DEBUG eval/eval.go:246 rule_eval:alert-1-3 query:{PromQl:sum (pinglvs_packets_loss) by(target_node) / sum (pinglvs_packets_sent) by(target_node) * 100 >5 Severity:2 RecoverConfig:{JudgeType:0 RecoverExp:} Unit:none}, value:
Apr 15 17:06:26 nightingale-prod n9e[25317]: 2025-04-15 17:06:26.328323 DEBUG eval/eval.go:246 rule_eval:alert-1-3 query:{PromQl:sum (pinglvs_packets_loss) by(target_node) / sum (pinglvs_packets_sent) by(target_node) * 100 >=90 Severity:1 RecoverConfig:{JudgeType:0 RecoverExp:} Unit:none}, value:
Apr 15 17:06:26 nightingale-prod n9e[25317]: 2025-04-15 17:06:26.328362 INFO dispatch/log.go:20 event(a4362b9f8787f6e0e957ed2cae953af4 recovered) push_queue: rule_id=3 sub_id:0 cluster:vm1 [rulename=pinglvs target_node=shct01]100@1744707566
Apr 15 17:06:26 nightingale-prod n9e[25317]: 2025-04-15 17:06:26.352361 INFO dispatch/log.go:20 event(a4362b9f8787f6e0e957ed2cae953af4 recovered) consume: rule_id=3 sub_id:0 cluster:vm1 [rulename=pinglvs target_node=shct01]100@1744707566
Apr 15 17:06:26 nightingale-prod n9e[25317]: 2025-04-15 17:06:26.354415 DEBUG dispatch/dispatch.go:285 send to channel:dingtalk event:&{Id:64123 Cate:prometheus Cluster:vm1 DatasourceId:1 GroupId:2 GroupName:net Hash:a4362b9f8787f6e0e957ed2cae953af4 RuleId:3 RuleName:pinglvs RuleNote: RuleProd:metric RuleAlgo: Severity:1 PromForDuration:180 PromQl:sum (pinglvs_packets_loss) by(target_node) / sum (pinglvs_packets_sent) by(target_node) * 100 >=90 RuleConfig:{"event_relabel_config":[],"inhibit":true,"queries":[{"keys":{"labelKey":"","metricKey":"","valueKey":""},"prom_ql":"sum (pinglvs_packets_loss) by(target_node) / sum (pinglvs_packets_sent) by(target_node) * 100 \u003e5","severity":2,"unit":"none"},{"keys":{"labelKey":"","metricKey":"","valueKey":""},"prom_ql":"sum (pinglvs_packets_loss) by(target_node) / sum (pinglvs_packets_sent) by(target_node) * 100 \u003e=90","severity":1,"unit":"none"}]} RuleConfigJson:map[event_relabel_config:[] inhibit:true queries:[map[keys:map[labelKey: metricKey: valueKey:] prom_ql:sum (pinglvs_packets_loss) by(target_node) / sum (pinglvs_packets_sent) by(target_node) * 100 >5 severity:2 unit:none] map[keys:map[labelKey: metricKey: valueKey:] prom_ql:sum (pinglvs_packets_loss) by(target_node) / sum (pinglvs_packets_sent) by(target_node) * 100 >=90 severity:1 unit:none]]] PromEvalInterval:60 Callbacks: CallbacksJSON:[] RunbookUrl: NotifyRecovered:1 NotifyChannels:dingtalk NotifyChannelsJSON:[dingtalk] NotifyGroups:2 NotifyGroupsJSON:[2] NotifyGroupsObj:[0xc0001e1050] TargetIdent: TargetNote: TriggerTime:1744707566 TriggerValue:100 TriggerValues: TriggerValuesJson:{ValuesWithUnit:map[v:{Value:100 Unit: Text:100.00 Stat:100}]} Tags:rulename=pinglvs,,target_node=shct01 TagsJSON:[rulename=pinglvs target_node=shct01] TagsMap:map[rulename:pinglvs target_node:shct01] OriginalTags: OriginalTagsJSON:[] Annotations:{} AnnotationsJSON:map[] IsRecovered:true NotifyUsersObj:[<id:1 username:root nickname:超管 email: phone: contacts:{}> <id:2 username:dingding nickname: email: phone: contacts:{"dingtalk_robot_token":"https://oapi.dingtalk.com/robot/send?access_token=0af73612a567058df44eae67a4d314b797e9222a50173e2d813303c634cc944b"}>] LastEvalTime:1744707986 LastSentTime:1744707566 NotifyCurNumber:1 FirstTriggerTime:1744707566 ExtraConfig: Status:0 Claimant: SubRuleId:0 ExtraInfo:[] Target: RecoverConfig:{JudgeType:0 RecoverExp:} RuleHash:5c4c3ec9a5cfa9ab71681740dc583a13 ExtraInfoMap:[]} users:[<id:1 username:root nickname:超管 email: phone: contacts:{}> <id:2 username:dingding nickname: email: phone: contacts:{"dingtalk_robot_token":"https://oapi.dingtalk.com/robot/send?access_token=0af73612a567058df44eae67a4d314b797e9222a50173e2d813303c634cc944b"}>]
Apr 15 17:06:26 nightingale-prod n9e[25317]: 2025-04-15 17:06:26.544149 INFO sender/callback.go:173 dingtalk_sender: result=succ url=https://oapi.dingtalk.com/robot/send?access_token=xxx code=200 req:{markdown {pinglvs #### pinglvs
Apr 15 17:06:26 nightingale-prod n9e[25317]: ---
Apr 15 17:06:26 nightingale-prod n9e[25317]: - 告警级别: 1级
Apr 15 17:06:26 nightingale-prod n9e[25317]: - 恢复时间: 2025-04-15 17:06:26
Apr 15 17:06:26 nightingale-prod n9e[25317]: - 告警持续时长: 7m 0s
Apr 15 17:06:26 nightingale-prod n9e[25317]: - 告警事件标签:
Apr 15 17:06:26 nightingale-prod n9e[25317]: - target_node: shct01
Apr 15 17:06:26 nightingale-prod n9e[25317]: 事件详情|屏蔽1小时|查看曲线
Apr 15 17:06:26 nightingale-prod n9e[25317]: } {[] false}} response={"errcode":0,"errmsg":"ok"}

@wangyi17818
Copy link
Author

Apr 15 17:06:26 nightingale-prod n9e[25317]: 2025-04-15 17:06:26.328323 DEBUG eval/eval.go:246 rule_eval:alert-1-3 query:{PromQl:sum (pinglvs_packets_loss) by(target_node) / sum (pinglvs_packets_sent) by(target_node) * 100 >=90 Severity:1 RecoverConfig:{JudgeType:0 RecoverExp:} Unit:none}, value:
看起来是这里value为空造成的,但是没看到query的时间,如果query是当前时间,是否可能是数据还没上报,从时间范围看并不缺少数据

@710leo
Copy link
Member

710leo commented Apr 15, 2025

@wangyi17818 可以参考这个文档的思路,再看下原始数据是什么 https://flashcat.cloud/docs/content/flashcat-monitor/nightingale-v6/faq/alert-check/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants