Skip to content

如何实现当k8s集群某个节点内存告警时,告警带出topk pod资源使用的信息 #2617

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
geniuslc11 opened this issue Apr 21, 2025 · 3 comments

Comments

@geniuslc11
Copy link

geniuslc11 commented Apr 21, 2025

Question and Steps to reproduce

如何实现当k8s集群某个节点内存告警时,告警带出topk pod资源使用的信息?我在告警规则的备注里写topk(5, sum by (pod, namespace) (rate(container_cpu_usage_seconds_total[5m]))) 不生效。

Image

Image

Relevant logs and configurations

Version

v8.0.0-beta.10

@710leo
Copy link
Member

710leo commented Apr 23, 2025

@geniuslc11 告警规则的附加信息中支持了 query 模板函数,下面的配置可以查到数据,然后可以使用 First, Label 等模板函数对数据进行进一步处理,模板函数见 https://github.com/ccfos/nightingale/blob/main/pkg/tplx/tplx.go#L39
模板函数说明可以参考 https://prometheus.io/docs/prometheus/latest/configuration/template_reference/

Image

@litaotongxue
Copy link

@710leo 请问使用这种方法,怎么代入标签变量啊,类似于 {{$labels.ident}} 这种

@geniuslc11 告警规则的附加信息中支持了 query 模板函数,下面的配置可以查到数据,然后可以使用 First, Label 等模板函数对数据进行进一步处理,模板函数见 https://github.com/ccfos/nightingale/blob/main/pkg/tplx/tplx.go#L39 模板函数说明可以参考 https://prometheus.io/docs/prometheus/latest/configuration/template_reference/

Image

@ops-cc
Copy link

ops-cc commented Apr 27, 2025

你好,我也有这个问题,我尝试通过变量带入的时候会报这个错误
{{ query "topk(5, sum by (pod, namespace) (rate(container_cpu_usage_seconds_total{node="$ident"}[5m])))" }}

Image
但我手动查是可以查出来数据的

Image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants