Skip to content

Commit

Permalink
use operator deploy alertmanager
Browse files Browse the repository at this point in the history
  • Loading branch information
yunlzheng committed Aug 12, 2018
1 parent d13fc18 commit f7e1e87
Show file tree
Hide file tree
Showing 6 changed files with 147 additions and 28 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,6 @@ Prometheus操作指南:云原生监控之道
* [使用Grafana创建可视化仪表盘](./kubernetes/use-grafana-in-k8s.md)
* [使用Opertor管理Prometheus](./kubernetes/use-operator-manage-prometheus.md)
* [使用Prometheus Operator监控用户应用](./kubernetes/use-operator-monitor-app.md)
* [Prometheus Operator下的告警处理](./kubernetes/use-operator-alerting.md)
* [使用Prometheus Operator管理Alertmanager](./kubernetes/use-operator-alerting.md)
* [第9章 使用Prometheus监控Rancher集群](./rancher/README.md)
* [参考资料](./REFERENCES.md)
2 changes: 1 addition & 1 deletion SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,7 @@
* [使用Grafana创建可视化仪表盘](./kubernetes/use-grafana-in-k8s.md)
* [使用Opertor管理Prometheus](./kubernetes/use-operator-manage-prometheus.md)
* [使用Prometheus Operator监控用户应用](./kubernetes/use-operator-monitor-app.md)
* [Prometheus Operator下的告警处理](./kubernetes/use-operator-alerting.md)
* [使用Prometheus Operator管理Alertmanager](./kubernetes/use-operator-alerting.md)
* [小结](./kubernetes/SUMMARY.md)
* [第9章 使用Prometheus监控Rancher集群](./rancher/README.md)
* [参考资料](./REFERENCES.md)
12 changes: 6 additions & 6 deletions examples/prometheus-operator/02prometheus.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,10 @@ metadata:
spec:
replicas: 2
serviceAccountName: prometheus
serviceMonitorNamespaceSelector:
matchExpressions:
- {}
serviceMonitorSelector:
matchExpressions:
- key: k8s-app
operator: Exists
matchLabels:
team: frontend
ruleSelector:
matchLabels:
role: alert-rules
prometheus: example
13 changes: 13 additions & 0 deletions examples/prometheus-operator/example-rule.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
labels:
prometheus: example
role: alert-rules
name: prometheus-example-rules
spec:
groups:
- name: ./example.rules
rules:
- alert: ExampleAlert
expr: vector(1)
88 changes: 70 additions & 18 deletions kubernetes/use-operator-alerting.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
# Prometheus Operator下的告警处理
# 使用Prometheus Operator管理Alertmanager

为了通过Prometheus Operator管理Alertmanager实例,用户可以通过自定义资源Alertmanager进行定义,如下所示,通过replicas可以控制Alertmanager的实例数:

```
apiVersion: monitoring.coreos.com/v1
Expand All @@ -9,6 +11,31 @@ spec:
replicas: 3
```

当replicas大于1时,Prometheus Operator会自动通过集群的方式创建Alertmanager。将以上内容保存为文件alertmanager-setup.yaml,并通过以下命令创建:

```
$ kubectl create -f alertmanager-setup.yaml
alertmanager "example" created
```

查看Pod的情况如下所示,我们会发现Alertmanager的Pod实例一直处于ContainerCreating的状态中:

```
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
alertmanager-example-0 0/2 ContainerCreating 0 4m
```

通过kubectl describe命令查看该Pod实例状态,可以看到以下内容:

```
$ kubectl describe pods alertmanager-example-0
...
Warning FailedMount 4s (x2 over 2m) kubelet, cn-beijing.i-2ze52j61t5p9z4n60c9m Unable to mount volumes for pod "alertmanager-example-0_default(f75aff5c-9e37-11e8-9dc5-00163e124757)": timeout expired waiting for volumes to attach or mount for pod "default"/"alertmanager-example-0". list of unmounted volumes=[config-volume]. list of unattached volumes=[config-volume alertmanager-example-db default-token-tzpfg]
```

Prometheus Operator将通过Statefulset的方式创建Alertmanager实例,默认情况下,Alertmanager的实例会通过`alertmanager-{ALERTMANAGER_NAME}`的命名规则去查找Secret配置并以文件挂载的方式,将Secret的内容作为配置文件挂载到Alertmanager实例当中。因此,这里还需要为Alertmanager创建相应的配置内容,如下所示,是Alertmanager的配置文件:

```
global:
resolve_timeout: 5m
Expand All @@ -24,10 +51,24 @@ receivers:
- url: 'http://alertmanagerwh:30500/'
```

将以上内容保存为文件alertmanager.yaml,并且通过以下命令创建名为alrtmanager-example的Secret资源:

```
$ kubectl create secret generic alertmanager-example --from-file=alertmanager.yaml
secret "alertmanager-example" created
```

在Secret创建成功后,查看当前Alertmanager Pod实例状态。如下所示:

```
$ kubectl get pods
alertmanager-example-0 2/2 Running 0 37m
alertmanager-example-1 2/2 Running 0 31m
alertmanager-example-2 2/2 Running 0 31m
```

为了能够访问到这些Alertmanager实例,我们需要创建相应的Service,如下所示:

```
apiVersion: v1
kind: Service
Expand All @@ -45,27 +86,38 @@ spec:
alertmanager: example
```

```
$ kubectl create -f alertmanager-service.yaml
```
访问Alertmanager UI,并查看当前集群状态:

```
$ kubectl apply -f prometheus.yaml
```
![Alertmanager集群状态](http://p2n2em8ut.bkt.clouddn.com/prometheus-alert-cluster-status.png)

接下来,我们只需要修改我们的Prometheus资源定义,通过alerting指定使用的Alertmanager资源即可:

```
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
kind: Prometheus
metadata:
creationTimestamp: null
name: prometheus
labels:
prometheus: example
role: alert-rules
name: prometheus-example-rules
prometheus: prometheus
spec:
groups:
- name: ./example.rules
rules:
- alert: ExampleAlert
expr: vector(1)
```
replicas: 2
serviceAccountName: prometheus
serviceMonitorSelector:
matchLabels:
team: frontend
alerting:
alertmanagers:
- namespace: default
name: alertmanager-example
port: web
ruleSelector:
matchLabels:
role: alert-rules
prometheus: example
```

在Prometheus重新加载配置完成后,通过UI可以查看Prometheus最新的配置内容,如下所示:

![Prometheus配置]](http://p2n2em8ut.bkt.clouddn.com/prometheus-alerting-auto.png)

自此,通过使用Prometheus Operator提供的自定义资源内容,声明式的创建和管理Prometheus实例以及Alertmanager集群。
58 changes: 56 additions & 2 deletions kubernetes/use-operator-monitor-app.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

本小节将展示,如何通过Prometheus Operator部署Prometheus实例并且实现对部署在Kubernetes中应用程序的监控。

## 部署Prometheus Server
## 部署Prometheus实例

为了能够让Prometheus实例能够正常的使用服务发现能力,我们首先需要基于Kubernetes的RBAC模型为Prometheus创建ServiceAccount并赋予相应的集群访问权限。如下所示:

Expand Down Expand Up @@ -114,7 +114,7 @@ spec:

当然,如上所示,目前为止我们的Prometheus还没有包含任何的监控配置信息。

## 监控Kubernetes中部署的服务
## 使用ServiceMonitor管理监控目标

为了能够模拟应用监控的场景,首先需要在Kubernetes中安装一个测试应用,如下所示:

Expand Down Expand Up @@ -216,4 +216,58 @@ spec:

![监控Target目标](http://p2n2em8ut.bkt.clouddn.com/prometheus-operator-targets.png)

## 使用PrometheusRule管理告警规则

对于Prometheus而言,在传统的管理方式上,我们还需要手动管理Prometheus的告警规则文件,并且在文件发生变化手动通知Prometheus加载这些文件。 而在Prometheus Operator模式下,我们只需要通过自定义资源类型PrometheusRule声明即可

```
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
labels:
prometheus: example
role: alert-rules
name: prometheus-example-rules
spec:
groups:
- name: ./example.rules
rules:
- alert: ExampleAlert
expr: vector(1)
```

将以上内容保存为example-rule.yaml文件,并且通过kubectl命令创建相应的资源:

```
$ kubectl create -f example-rule.yaml
prometheusrule "prometheus-example-rules" created
```

告警规则创建成功后,通过ruleSelector选择需要关联的PrometheusRule即可

```
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
name: prometheus
labels:
prometheus: prometheus
spec:
replicas: 2
serviceAccountName: prometheus
serviceMonitorSelector:
matchLabels:
team: frontend
ruleSelector:
matchLabels:
role: alert-rules
prometheus: example
```

Prometheus重新加载配置后,从UI中我们可以查看到通过PrometheusRule自动创建的告警规则配置:

![Prometheus告警规则](http://p2n2em8ut.bkt.clouddn.com/prometheus-rule.png)

到目前为止,通过Prometheus Operator自定义的资源类型Prometheus和ServiceMonitor声明了需要在Kubernetes集群中部署的Prometheus实例以及相应的监控配置。通过监听Prometheus和ServicMonitor资源的变化,自动创建和管理Prometheus的配置信息,从而实现了对Prometheus声明式的自动化管理。

到目前为止,我们已经通过Prometheus Operator的自定义资源类型管理了Promtheus的实例,监控配置以及告警规则等资源。通过Prometheus Operator将原本手动管理的工作全部变成声明式的管理模式,大大简化了Kubernetes下的Prometheus运维管理的复杂度。 接下来,我们将继续使用Promtheus Operator定义和管理Alertmanager相关的内容。

0 comments on commit f7e1e87

Please sign in to comment.