Skip to content

Commit

Permalink
add alert for tenant reaching head series limit (#6467)
Browse files Browse the repository at this point in the history
Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>
  • Loading branch information
thibaultmg committed Jun 26, 2023
1 parent 40e1b2d commit 5d695e9
Show file tree
Hide file tree
Showing 6 changed files with 33 additions and 1 deletion.
1 change: 1 addition & 0 deletions CHANGELOG.md
Expand Up @@ -24,6 +24,7 @@ We use *breaking :warning:* to mark changes that are not backward compatible (re
- [#6420](https://github.com/thanos-io/thanos/pull/6420) Index Cache: Cache expanded postings.
- [#6441](https://github.com/thanos-io/thanos/pull/6441) Compact: Compactor will set `index_stats` in `meta.json` file with max series and chunk size information.
- [#6466](https://github.com/thanos-io/thanos/pull/6466) Mixin (Receive): add limits alerting for configuration reload and meta-monitoring.
- [#6467](https://github.com/thanos-io/thanos/pull/6467) Mixin (Receive): add alert for tenant reaching head series limit.

### Fixed
- [#6456](https://github.com/thanos-io/thanos/pull/6456) Store: fix crash when computing set matches from regex pattern
Expand Down
9 changes: 9 additions & 0 deletions examples/alerts/alerts.md
Expand Up @@ -548,6 +548,15 @@ rules:
for: 5m
labels:
severity: warning
- alert: ThanosReceiveTenantLimitedByHeadSeries
annotations:
description: Thanos Receive tenant {{$labels.tenant}} is limited by head series.
runbook_url: https://github.com/thanos-io/thanos/tree/main/mixin/runbook.md#alert-name-thanosreceivetenantlimitedbyheadseries
summary: A Thanos Receive tenant is limited by head series.
expr: sum by(job, tenant) (increase(thanos_receive_head_series_limited_requests_total{job=~".*thanos-receive.*"}[5m])) > 0
for: 5m
labels:
severity: warning
```

## Replicate
Expand Down
9 changes: 9 additions & 0 deletions examples/alerts/alerts.yaml
Expand Up @@ -292,6 +292,15 @@ groups:
for: 5m
labels:
severity: warning
- alert: ThanosReceiveTenantLimitedByHeadSeries
annotations:
description: Thanos Receive tenant {{$labels.tenant}} is limited by head series.
runbook_url: https://github.com/thanos-io/thanos/tree/main/mixin/runbook.md#alert-name-thanosreceivetenantlimitedbyheadseries
summary: A Thanos Receive tenant is limited by head series.
expr: sum by(job, tenant) (increase(thanos_receive_head_series_limited_requests_total{job=~".*thanos-receive.*"}[5m])) > 0
for: 5m
labels:
severity: warning
- name: thanos-sidecar
rules:
- alert: ThanosSidecarBucketOperationsFailed
Expand Down
12 changes: 12 additions & 0 deletions mixin/alerts/receive.libsonnet
Expand Up @@ -170,6 +170,18 @@
severity: 'warning',
},
},
{
alert: 'ThanosReceiveTenantLimitedByHeadSeries',
annotations: {
description: 'Thanos Receive tenant {{$labels.tenant}}%s is limited by head series.' % location,
summary: 'A Thanos Receive tenant is limited by head series.',
},
expr: 'sum by(%(dimensions)s, tenant) (increase(thanos_receive_head_series_limited_requests_total{%(selector)s}[5m])) > 0' % thanos.receive,
'for': '5m',
labels: {
severity: 'warning',
},
},
],
},
],
Expand Down
1 change: 1 addition & 0 deletions mixin/runbook.md
Expand Up @@ -65,6 +65,7 @@
|ThanosReceiveNoUpload|Thanos Receive has not uploaded latest data to object storage.|Thanos Receive {{$labels.instance}} has not uploaded latest data to object storage.|critical|[https://github.com/thanos-io/thanos/tree/main/mixin/runbook.md#alert-name-thanosreceivenoupload](https://github.com/thanos-io/thanos/tree/main/mixin/runbook.md#alert-name-thanosreceivenoupload)|
|ThanosReceiveLimitsConfigReloadFailure|Thanos Receive has not been able to reload the limits configuration.|Thanos Receive {{$labels.job}} has not been able to reload the limits configuration.|warning|[https://github.com/thanos-io/thanos/tree/main/mixin/runbook.md#alert-name-thanosreceivelimitsconfigreloadfailure](https://github.com/thanos-io/thanos/tree/main/mixin/runbook.md#alert-name-thanosreceivelimitsconfigreloadfailure)|
|ThanosReceiveLimitsHighMetaMonitoringQueriesFailureRate|Thanos Receive has not been able to update the number of head series.|Thanos Receive {{$labels.job}} is failing for {{$value humanize}}% of meta monitoring queries.|warning|[https://github.com/thanos-io/thanos/tree/main/mixin/runbook.md#alert-name-thanosreceivelimitshighmetamonitoringqueriesfailurerate](https://github.com/thanos-io/thanos/tree/main/mixin/runbook.md#alert-name-thanosreceivelimitshighmetamonitoringqueriesfailurerate)|
|ThanosReceiveTenantLimitedByHeadSeries|A Thanos Receive tenant is limited by head series.|Thanos Receive tenant {{$labels.tenant}} is limited by head series.|warning|[https://github.com/thanos-io/thanos/tree/main/mixin/runbook.md#alert-name-thanosreceivetenantlimitedbyheadseries](https://github.com/thanos-io/thanos/tree/main/mixin/runbook.md#alert-name-thanosreceivetenantlimitedbyheadseries)|

## thanos-rule

Expand Down
2 changes: 1 addition & 1 deletion pkg/rules/rules_test.go
Expand Up @@ -69,7 +69,7 @@ func testRulesAgainstExamples(t *testing.T, dir string, server rulespb.RulesServ
Name: "thanos-receive",
File: filepath.Join(dir, "alerts.yaml"),
Rules: []*rulespb.Rule{
someAlert, someAlert, someAlert, someAlert, someAlert, someAlert, someAlert, someAlert, someAlert,
someAlert, someAlert, someAlert, someAlert, someAlert, someAlert, someAlert, someAlert, someAlert, someAlert,
},
Interval: 60,
PartialResponseStrategy: storepb.PartialResponseStrategy_ABORT,
Expand Down

0 comments on commit 5d695e9

Please sign in to comment.