Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

receive/mixin: add alert for tenant reaching head series limit #6467

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Expand Up @@ -24,6 +24,7 @@ We use *breaking :warning:* to mark changes that are not backward compatible (re
- [#6420](https://github.com/thanos-io/thanos/pull/6420) Index Cache: Cache expanded postings.
- [#6441](https://github.com/thanos-io/thanos/pull/6441) Compact: Compactor will set `index_stats` in `meta.json` file with max series and chunk size information.
- [#6466](https://github.com/thanos-io/thanos/pull/6466) Mixin (Receive): add limits alerting for configuration reload and meta-monitoring.
- [#6467](https://github.com/thanos-io/thanos/pull/6467) Mixin (Receive): add alert for tenant reaching head series limit.

### Fixed
- [#6456](https://github.com/thanos-io/thanos/pull/6456) Store: fix crash when computing set matches from regex pattern
Expand Down
9 changes: 9 additions & 0 deletions examples/alerts/alerts.md
Expand Up @@ -548,6 +548,15 @@ rules:
for: 5m
labels:
severity: warning
- alert: ThanosReceiveTenantLimitedByHeadSeries
annotations:
description: Thanos Receive tenant {{$labels.tenant}} is limited by head series.
runbook_url: https://github.com/thanos-io/thanos/tree/main/mixin/runbook.md#alert-name-thanosreceivetenantlimitedbyheadseries
summary: A Thanos Receive tenant is limited by head series.
expr: sum by(job, tenant) (increase(thanos_receive_head_series_limited_requests_total{job=~".*thanos-receive.*"}[5m])) > 0
for: 5m
labels:
severity: warning
```

## Replicate
Expand Down
9 changes: 9 additions & 0 deletions examples/alerts/alerts.yaml
Expand Up @@ -292,6 +292,15 @@ groups:
for: 5m
labels:
severity: warning
- alert: ThanosReceiveTenantLimitedByHeadSeries
annotations:
description: Thanos Receive tenant {{$labels.tenant}} is limited by head series.
runbook_url: https://github.com/thanos-io/thanos/tree/main/mixin/runbook.md#alert-name-thanosreceivetenantlimitedbyheadseries
summary: A Thanos Receive tenant is limited by head series.
expr: sum by(job, tenant) (increase(thanos_receive_head_series_limited_requests_total{job=~".*thanos-receive.*"}[5m])) > 0
for: 5m
labels:
severity: warning
- name: thanos-sidecar
rules:
- alert: ThanosSidecarBucketOperationsFailed
Expand Down
12 changes: 12 additions & 0 deletions mixin/alerts/receive.libsonnet
Expand Up @@ -170,6 +170,18 @@
severity: 'warning',
},
},
{
alert: 'ThanosReceiveTenantLimitedByHeadSeries',
annotations: {
description: 'Thanos Receive tenant {{$labels.tenant}}%s is limited by head series.' % location,
summary: 'A Thanos Receive tenant is limited by head series.',
},
expr: 'sum by(%(dimensions)s, tenant) (increase(thanos_receive_head_series_limited_requests_total{%(selector)s}[5m])) > 0' % thanos.receive,
'for': '5m',
labels: {
severity: 'warning',
},
},
],
},
],
Expand Down
1 change: 1 addition & 0 deletions mixin/runbook.md
Expand Up @@ -65,6 +65,7 @@
|ThanosReceiveNoUpload|Thanos Receive has not uploaded latest data to object storage.|Thanos Receive {{$labels.instance}} has not uploaded latest data to object storage.|critical|[https://github.com/thanos-io/thanos/tree/main/mixin/runbook.md#alert-name-thanosreceivenoupload](https://github.com/thanos-io/thanos/tree/main/mixin/runbook.md#alert-name-thanosreceivenoupload)|
|ThanosReceiveLimitsConfigReloadFailure|Thanos Receive has not been able to reload the limits configuration.|Thanos Receive {{$labels.job}} has not been able to reload the limits configuration.|warning|[https://github.com/thanos-io/thanos/tree/main/mixin/runbook.md#alert-name-thanosreceivelimitsconfigreloadfailure](https://github.com/thanos-io/thanos/tree/main/mixin/runbook.md#alert-name-thanosreceivelimitsconfigreloadfailure)|
|ThanosReceiveLimitsHighMetaMonitoringQueriesFailureRate|Thanos Receive has not been able to update the number of head series.|Thanos Receive {{$labels.job}} is failing for {{$value humanize}}% of meta monitoring queries.|warning|[https://github.com/thanos-io/thanos/tree/main/mixin/runbook.md#alert-name-thanosreceivelimitshighmetamonitoringqueriesfailurerate](https://github.com/thanos-io/thanos/tree/main/mixin/runbook.md#alert-name-thanosreceivelimitshighmetamonitoringqueriesfailurerate)|
|ThanosReceiveTenantLimitedByHeadSeries|A Thanos Receive tenant is limited by head series.|Thanos Receive tenant {{$labels.tenant}} is limited by head series.|warning|[https://github.com/thanos-io/thanos/tree/main/mixin/runbook.md#alert-name-thanosreceivetenantlimitedbyheadseries](https://github.com/thanos-io/thanos/tree/main/mixin/runbook.md#alert-name-thanosreceivetenantlimitedbyheadseries)|

## thanos-rule

Expand Down
2 changes: 1 addition & 1 deletion pkg/rules/rules_test.go
Expand Up @@ -69,7 +69,7 @@ func testRulesAgainstExamples(t *testing.T, dir string, server rulespb.RulesServ
Name: "thanos-receive",
File: filepath.Join(dir, "alerts.yaml"),
Rules: []*rulespb.Rule{
someAlert, someAlert, someAlert, someAlert, someAlert, someAlert, someAlert, someAlert, someAlert,
someAlert, someAlert, someAlert, someAlert, someAlert, someAlert, someAlert, someAlert, someAlert, someAlert,
},
Interval: 60,
PartialResponseStrategy: storepb.PartialResponseStrategy_ABORT,
Expand Down