You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
thanos_rule_evaluation_with_warnings_total. If you choose to use Rules and Alerts with [partial response strategy’s](https://thanos.io/tip/components/rule.md/#partial-response) value as “warn”, this metric will tell you how many evaluation ended up with some kind of warning. To see the actual warnings see WARN log level. This might suggest that those evaluations return partial response and might not be accurate.
However, this metric becomes broken since Prometheus started to propagate warnings from the engine prometheus/prometheus#12152. For example, metric name doesn't end with _total will result a warning and cause thanos_rule_evaluation_with_warnings_total metric to increase and trigger the alarm.
Proposal
For thanos_rule_evaluation_with_warnings_total, let's include warnings from partial response only or
Remove this alert from Thanos mixin and update the doc.
The text was updated successfully, but these errors were encountered:
Any idea how to fix this issue? Currently what I am thinking is to move the partial response warning metric to Thanos Querier and remove it from Ruler.
Thanos Querier is able to detect whether the warning is coming from the storage layer or from the engine so we can emit the correct metric for partial response only.
Problem
Query response warnings were used in Thanos to propagate partial response information of Store APIs.
https://thanos.io/tip/components/rule.md/#must-have-essential-ruler-alerts recommends setting alarm on
thanos_rule_evaluation_with_warnings_total
metric and we have this alert on Thanos mixins as well.However, this metric becomes broken since Prometheus started to propagate warnings from the engine prometheus/prometheus#12152. For example, metric name doesn't end with
_total
will result a warning and causethanos_rule_evaluation_with_warnings_total
metric to increase and trigger the alarm.Proposal
thanos_rule_evaluation_with_warnings_total
, let's include warnings from partial response only orThe text was updated successfully, but these errors were encountered: