-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
irate/rate query issue on the deduplicated metrics #5025
Comments
Deduplication doesn't work in your case because that's for 1:1 deduplication so mainly in receiver cases. |
Now I noticed that the deduplicated stream actually contains the metrics from both prometheus replicas and in most cases the values are identical for both. Could this be fixed by using --deduplication.func=penalty for the compactor? From the docs this option seems to break irate in other ways. |
@yeya24 From what I can see at https://thanos.io/tip/components/compact.md/#enabling-vertical-compaction in order to enable vertical compaction I need to use that hidden flag. Maybe the note at Line 719 in 56d99eb
|
I meet the same case. |
Ideally it should solve the issue but now the functionality is not very stable when downsampling is also enabled so I wouldn't recommend enabling it.
irate works by calculating the rate only based on the last two data points from your series.
If irate function is called at time 1641055885 then the last two points are 1@1641055871 and 10001@1641055885 respectively. So the value diff is about 10000. This is just an example. If you are using rate then it might go wrong, too when counter values mistakenly resets.
Yeah that's a good idea. Would you like to contribute and update the doc? |
Hello 👋 Looks like there was no activity on this issue for the last two months. |
Closing for now as promised, let us know if you need this to be reopened! 🤗 |
Thanos, Prometheus and Golang version used:
Thanos deployed with kube-prometheus. Initial issue detected while using thanos 0.23.1. Upgraded only thanos query to 0.24.0 but still having this issue.
Thanos compact:
compact --wait --log.level=info --log.format=logfmt --objstore.config=$(OBJSTORE_CONFIG) --data-dir=/var/thanos/compact --debug.accept-malformed-index --retention.resolution-raw=360d --retention.resolution-5m=360d --retention.resolution-1h=360d --delete-delay=8h --deduplication.replica-label=prometheus_replica --deduplication.replica-label=rule_replica --deduplication.replica-label=replica
Thanos query:
query --grpc-address=0.0.0.0:10901 --http-address=0.0.0.0:9090 --log.level=debug --log.format=logfmt --query.replica-label=prometheus_replica --query.replica-label=rule_replica --query.replica-label=replica --store=dnssrv+_grpc._tcp.prometheus-network-monitoring-thanos-sidecar.network-monitoring.svc.cluster.local --store=dnssrv+_grpc._tcp.thanos-store-network-monitoring.network-monitoring.svc.cluster.local --query.timeout=5m --query.lookback-delta=15m --query.auto-downsampling
Prometheus: quay.io/prometheus/prometheus:v2.29.2
--web.console.templates=/etc/prometheus/consoles --web.console.libraries=/etc/prometheus/console_libraries --config.file=/etc/prometheus/config_out/prometheus.env.yaml --storage.tsdb.path=/prometheus --storage.tsdb.retention.time=5d --web.enable-lifecycle --storage.tsdb.no-lockfile --web.external-url=https://prom-network-monitoring --web.route-prefix=/ --storage.tsdb.max-block-duration=2h --storage.tsdb.min-block-duration=2h
Thanos sidecar version: 0.23.1
Object Storage Provider:
Minio
What happened:
This is one of the metrics where this issue manifests:
This is how an irate looks like:
The red line is the irate of the deduplicated metric while the yellow and blue are the irates for the metrics from prometheus.
Pressing Execute multiple times, sometimes the graph looks like this:
What you expected to happen:
I would expect for the irate of the deduplicated metric to look like the original from prometheus.
How to reproduce it (as minimally and precisely as possible):
Full logs to relevant components:
I could not find any error in thanos compact or thanos query logs.
Anything else we need to know:
This is how a rate query looks like on the same metric:
It looks somewhat better then the irate query but I would have expected to be closer to the query on the original data from prometheus.
This is how the values of the original and deduplicated metrics look like:
The third line is the deduplicated metric.
Is this a bug or something wrong in my configuration?
The text was updated successfully, but these errors were encountered: